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SHAWN POWERS 


Linux Means 
Business—and Fun 


T his month, we focus on Linux in the enter¬ 
prise. No, not that Enterprise (I'm pretty 
sure I used the Star Trek joke last year). 
We're talking about large deployments of Linux. 
One awesome thing about such an issue focus is 
that it really doesn't apply only to folks adminis¬ 
tering huge networks. Most of the features scale 
down quite nicely for those of us administering a 
closet in our bedroom, a server rack in the base¬ 
ment or just a database on the couch. In fact, 
Reuven M. Lerner talks about CouchDB this 
month. Silly name aside, CouchDB is becoming 
more and more popular as a non-relational 
database. If you like what Canonical is doing 
with its syncing technology in Ubuntu One, 
you'll want to check out the underlying frame¬ 
work it uses—first and foremost, CouchDB. 

Bill Childers has his head up in the clouds 
again, and this time he shows us how to set 
up our own cloud with Eucalyptus. Sure 
Amazon and that ilk are great if you don't 
have your own servers, but if you already own 
the hardware, why not make your own cloud? 
Bill shows how. If you already have your 
servers set up, perhaps you're just at the 
point when you need to administer them, 
whether they're in the cloud or not. Kyle 
Rankin, an enterprise sysadmin himself, pre¬ 
sents a few more tricks of the trade with a 
series of crafty ways to use SSH. 

Perhaps you're not interested in setting 
up a cloud, and you just want to make your 
existing servers (or cloud) do something use¬ 
ful. We have plenty of that this month too. 
Dave Taylor continues his series on scripting 
HTML forms, Michael Nugent talks about 
MySQL replication, and Daniel Bartholomew 
discusses SQL vs. NoSQL (where we get to 
watch Daniel battle his own version of 
Point/Counterpoint). Regardless of what ser¬ 
vices you offer on your servers, you need to 
be able to monitor them. Also in this issue, 
Paul Tader shows us Zabbix, a cross-platform 
monitoring tool that takes much of the sting 
out of setting up monitoring. 

For some of us, Linux in the enterprise is 


only a dream we'd like to see come to 
fruition. We labor over our Linux machines at 
home, yet at work, we're stuck with propri¬ 
etary operating systems and closed-source 
programs. Jeramiah Bowling compares 
Microsoft Windows and Linux in the enter¬ 
prise, pulling the rug out from under some 
common misconceptions and providing some 
real data we might use to demonstrate where 
Linux might be a better fit. Unfortunately, 
even if the logic is clear, sometimes politics 
are a bigger stumbling block than feasibility. 
Avi Deitcher writes about the whole process 
of introducing open-source software and 
ideals to business folks. It's often a difficult 
undertaking, and Avi has some smart tips. 

What if enterprise Linux doesn't interest 
you at all? Don't worry; we understand. And, 
even if you work on Linux in the enterprise, 
sometimes it's nice to come home and enjoy 
your own Linux machines without worrying 
about clouds, monitoring, scalability and 
other 9-to-5-sounding words. Jono Bacon 
introduces us to Quickly this month. Quickly 
is a program that supplies templates, tools 
and a framework to allow quick desktop 
application development. If you've ever wanted 
to make your own applications, but didn't 
want to go through the frustration of the 
traditional development model. Quickly might 
be right up your alley. 

We also have a review of the Coyote Point 
Application Balancer, Dirk Elmendorf covers 
desktop document scanning, and John Knight 
introduces some fresh projects. Plus, there are all 
the other articles that make Linux Journal such 
a fun magazine for geeks. So whether you're 
building your own cloud, tweaking your existing 
infrastructure or just killing time on the command 
line, this issue has you covered.^ 


Shawn Powers is the Associate Editor for Linux Journal. He's also the Gadget 
Guy for LinuxJournal.com. and he has an interesting collection of vintage 
Garfield coffee mugs. Don’t let his silly hairdo fool you. he's a pretty 
ordinary guy and can be reached via e-mail at shawn@linuxjournal.com. 
Or. swing by the fflinuxjournal IRC channel on Freenode.net. 
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Tech Tip Video Tip 

I recently started enjoying Shawn Powers' 
tech tip videos [on LinuxJournal.com], 
and I have one suggestion for a possible 
improvement in usability/accessibility. 

I don't think a full transcript is neces¬ 
sary, but a bulleted list of files, direc¬ 
tories, commands and URLs mentioned 
in them might be cool. Thanks for 
any consideration. 

Dallas Legan 

I have gotten many similar suggestions 
for tech tip videos, and something 
along the lines of “show notes" for a 
one-minute video might make sense. 

The tech tips have been floundering of 
late (hopefully, by the time this prints 
that will be different) due to my house 
fire, but once I get back on track, I'll try 
to include appropriate text. If nothing 
else, it will make searching for the 
videos easier!—Shawn Powers 

Not Writing Filesystems 
Often Is a Feature! 

In the March 2010 Letters, Peter 
Bratton complained how explore2fs 
could not write NTFS files and 
recommended www.fs-driver.org, 
a non-open-source driver. Perhaps 
not writing foreign format filesystems 
is a feature, not a limitation? 


V 


All software can be expected to have 
bugs, and filesystem bugs are especially 
devastating, as they easily can destroy 
a system. If I need interoperability 
between Windows and Linux, I often 
use an intermediate fat filesystem, 
where I can put files from both 
systems, and I won't be devastated 
if I uncover a bug. 

It often is a good idea to mount foreign 
format filesystems -r (read-only). I do 
that on dual-boot machines when I 
run Linux. 

marty 

Just-in-Time Content 

Thanks for a great magazine. The 
April 2010 issue on Software 
Development had an article on 
Selenium. I had just started looking 
for a Web automation and testing 
tool, and this article and our corre¬ 
sponding use of Selenium has saved 
me a lot of time. Thanks for content 
that applies directly to what we do. I 
love just-in-time editorial content. I 
can't wait to see what I will need for 
next month. Keep up the good work. 

John Beauford 

Re: Legally Using Linux 

I wanted to comment on Luke's letter 
asking about Linux licensing on page 
12 of the May 2010 issue of LJ. In his 
example, he mentions Red Hat, and says 
that it is hard to determine the licenses 
for everything and ponders if all the 
work is left up to the user. 

I can tell you from the perspective of a 
Fedora, Red Hat and CentOS user that 
determining the license something is 
under is easy. For installed software, 
just do this: 

rpm -qi {packagename} 

That queries the information about the 
package, and one of the fields is the 
software license. 

If a package isn't installed but you have 


a copy of the .rpm where you can get 
to it, just add a p to the flag: 

rpm -qip {package-filename}.rpm 

Red Hat has lawyers, and it even has 
a person investigating licenses for the 
Fedora Project. Red Hat takes licenses 
and licensing very seriously and shies 
away from things that are known to 
be licensed under questionable terms. 
So, for example. Red Hat doesn't 
ship with any Adobe products 
pre-installed, nor MP3 playback nor 
decoder, just to name a few. There 
have been several occasions when 
Fedora has dropped a package 
because there was some uncertainty 
about the license. If you think Red 
Hat puts its customers at risk with the 
software it ships, you are mistaken. 
Red Hat has done its homework (see 
www.redhat.com/legal/open_source 
_assurance_agreement.html). 

For an example article on Fedora's 
license guy doing his homework, see 
the following Linux Weekly News 
article: lwn.net/Articles/312262. 
That certainly isn't the only article 
on the subject. There also is a project 
that has a goal of aiding in license 
verification, although it has been a 
while since I've read about it. I won't 
say anything other than check out 
the Web site: fossology.org. 

I think licensing on Linux is much easier 
than the EULAware no one reads on 
most proprietary OSes. I'm also not 
aware of any business nor end user 
using a mainstream Linux distribution 
that has gotten sued for license 
violation of products installed from the 
distro's stock repository. It just doesn't 
happen. That might change if Microsoft 
decides to go after Linux users for the 
violations of its patents, as it has 
claimed in the media. 

Scott Dowdle 

We discussed this letter in the Linux 
Journal Insider podcast as well [see 

www.linuxjournal.com/podcast/ 
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lj-insider for our monthly podcast on 
each new issue]. Thank you for the info. 
It's greatly appreciated. — Ed. 

Dave Taylor's Trap, Part II 

In the May 2010 issue, a letter titled 
"Dave Taylor's Trap" recommended 
against setting a trap on 0 (zero). A 
trap on 0 is quite useful. It is a trap 
on EXIT. The bash(1) man page states 
"If a sigspec is EXIT (0), the command 
arg is executed on exit from the 
shell." Trap on EXIT (0) is available 
in other Bourne shell-compatible 
shells, such as sh, ksh, dash and zsh. 

I recommend trapping on "0123 
15", as in the following example, to 
remove a tmp file when a script exits 
or is killed. The shell does not execute 
a trap on EXIT (0) while inside another 
trap, avoiding what otherwise would 
be a recursive loop if you left the 
EXIT trap set while trapping on it: 


tmpfile='tempfile' || exit 1 

trap ’rm -f Stmpfile; exit Sexitval’ 0 1 2 3 15 

# do some work with Stmpfile 

exitval=0 

Paul Jackson 

Dave Taylor's Trap, Part III 

A letter in the April 2010 issue com¬ 
plained about Dave Taylor using signal 
0 with the shell trap command, and 
he apologized for the "error". This is 
not an error; including signal 0 is a 
common and extremely useful feature 
of the trap built in. You and the letter- 
writer are correct that there is no 
actual signal 0, but in the case of 
trap, signal 0 specifies the event of 
normal termination of the script. 

Therefore, one can use trap 
"cteanup_code" 0 to invoke 
cleanup code upon normal exit. I use 
it all the time to get rid of temp files 
and other debris. Thanks for your 
helpful column. 

bruce 

Dave Taylor replies: <slaps forehead> 


Thanks! I knew there was a reason 
that the trap 0 was a good idea, I just 
spaced on what it was. Now I can 
sleep well at night. 

Using Text Editors for Writing 
Code 

Dave Taylor's comment, in his May 
2010 Work the Shell column on 
converting HTML forms, about using 
vi for a few code hacks, started me 
thinking about using text editors for 
writing code. Just how important is 
the choice of text editors when work¬ 
ing with code? Emacs and vi appear 
to be the most frequent recommenda¬ 
tions. Why are they so popular or 
recommended so often, and what 
about gedit, joe, Leafpad, pico or nano 
as a text editor for hacking code? 

Philip S. Ruckle, Jr. 

XDMCP 

Regarding Michael J. Hammel's 
"Running Remote Applications" in 
the February 2010 issue, the author 
seems to mistake the purpose and 
workings of XDMCP. XDMCP allows 
you to select an X client from a list 
(using the XDMCP chooser) and to 
log in with the remote display (X server) 
preconfigured for the session. 

On page 61, the author states, "The 
use of the -display option is tied to 
the configuration of XDMCP on the X 
server." This is not true; the -display 
option can be used at any login shell 
and can be pointed at any available 
X server. For example, a user can use 
an SSH login and enter the -display 
option to point at a different server. 

In fact, the -display option can be 
used to start an X client on an arbi¬ 
trary X server. The only configuration 
required of the X server is that TCP 
connections must be allowed. 

It appears from the article that TCP 
connections are enabled only for 
gdm/kdm; enabling TCP connections 
to the X server (at least in Ubuntu 
Karmic) can be done by removing the 
-notisten tcp option shown in the 


/etc/X11/xinit/xserverrc file. 

It also is not necessary to switch 
runlevels to restart gdm (or kdm); the 
display manager is a special-purpose 
X server and can be "restarted" by 
killing the running xdm with a Ctrl- 
Alt-Backspace or by using the kill 
command at the command line. 

One thing that was not clarified is 
that VNC is essentially a new X server 
with a network-based remote display. 
VNC originally was designed in exactly 
this way: the code to actually present 
a display was removed from X and 
the networking facilities added in. 
Running VNC will not usually share 
the current display; this is done with 
other tools. 

David Douthitt 

Point/Counterpoint 

Regarding Kyle Rankin and Bill Childers' 
"/opt vs. /usr/local" in the March 2010 
issue, there are a few facts that are 
missing or misstated in the exchange 
between these two brilliant minds. 

First, the real contrast between 
/usr/local and /opt is the layout: /opt 
is used by putting all of the software's 
files in one directory, and /usr/local 
creates a new hierarchy with 
/usr/local/bin, /usr/local/sbin, 
/usr/local/etc and so on. This is hinted 
at but never explicitly stated. 

Second, /usr/local is, in fact, older than 
/opt; /opt came along with Solaris, 
whereas /usr/local predates Solaris. 

Third, none of the packages in Linux 
distributions put their files in /usr/local; 
rather, all put their files in /usr. Solaris 
packages put their files in /opt. If you 
use tar and gzip to compile from 
source, you'll find your files going 
into /usr/local. If you compile a 
custom Apache, you'll find the files 
in /usr/local (not /usr). 

Finally, consider that HP-UX in recent 
years switched from /opt to /usr/local. 
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The path for files compiled into /opt 
becomes quite large in most installa¬ 
tions. Adding /usr/local files means 
adding two directories to the path: 
/usr/local/bin and /usr/local/sbin. 

FreeBSD and other BSDs use /usr/local 
exclusively for added software, just 
as Solaris uses /opt. Linux doesn't 
use /usr/local unless you compile 
your own software. 

Tell Kyle and Bill to keep up the 
good work. 

David Douthitt 

Kyle Rankin replies: Thanks for 
all of the extra background into 
/opt/ and /usr/local. I can tell you 
are an experienced and learned 
administrator, and not just because 
you agree with me. 


Bill Childers replies: Thanks for the 
historical insight! It may be worth¬ 
while to note that the Blastwave 
Solaris folks do create their own bin, 
etc and lib directories under /opt/csw 
for some of the reasons you specify 
in your letter. I do appreciate the 
multiple OS point of view you 
referenced, as my esteemed colleague 
tends to see the world through 
penguin-colored glasses. 

Using Telnet to Send E-mail 

Kyle Rankin's telnet e-mail works 
nicely (see Kyle's Upfront piece in 
the May 2010 issue), but the MAIL 
FROM: command does not conform 
to the actual SMTP RFC2821, so 
most SMTP servers reject it with 
a syntax error: 

HAIL FROH: <biU.gates@microsoft.com> 

will do the trick with the 
additional "<address>"! 

Torsten 

Kyle Rankin replies: I must 
work with too many postfix 
servers, as they are more 
forgiving of the syntax. 

Thanks for the clarification, 
and you get extra points for 
referring to the RFC. 

Open Source for TV 
Broadcasting 

This message is targeted 
toward Doc Searls. It seems 
for many years he has attended 
the annual NAB show in 
April, and then writes about 
the silos. U might be inter¬ 
ested to learn more about 
how several broadcasters 
throughout the world, includ¬ 
ing one of Europe's largest, 
is using MLT for a playout 
server and contributing to it. 
This toolkit/library also serves 
as the engine for the up- 
and-coming video editors 
Kdenlive and OpenShot. It's 
something for Doc to learn 
about if he is attending 
the NAB show this year 
(www.mltframework.org). 

Dan Dennedy 


Flave a photo you'd like to share with 
LJ readers? Send your submission to 
publisher@linuxjournal.com. If we run 
yours in the magazine, we'll send you 
a free T-shirt. 



Me. the cannabis activist/geek chick at a 
recent cannabis-related event here in 
Montana. (Thank goodness I brought my 
LJ magazines with me.) Photos submitted 
by Fleather Masterson. Missoula. Montana. 
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FRONT 


NEWS + FUN 


diff -u 

WHAT’S NEW IN KERNEL DEVELOPMENT 


Tablets, 
the New 
Netbooks 


More and more progress continues 
to be made on eradicating the big 
kernel lock (BKL). We've now 
reached the stage where only 
relatively few parts of the kernel 
still depend on the BKL. Arnd 
Bergmann, who's been maintaining 
his own source tree specifically tar¬ 
geting the BKL, recently announced 
that his own work, and the contri¬ 
butions of lots of other folks, had 
removed the BKL from the entire 
core kernel, and it was now possible 
to build a kernel that had no 
instance of the BKL at all. There 
still are some high-profile drivers 
that rely on the BKL though (for 
example, USB and VFAT), as well 
as a lot of more obscure drivers. 
Arnd also acknowledges that some 
of his BKL-removal patches may be 
superseded by other people's efforts 
in a particular area. For example, he 
took the BKL out of the TTY layer, 
but Alan Cox has been planning 
to do work on the TTY layer himself 
that probably would go into the 
official tree instead of Arnd's work. 
But, the upshot of all this is that 
the kernel is likely to become much 
more friendly to threaded applications 
in the fairly near future. 

With the modern proliferation 
of virtual systems like VMware, 

Xen and KVM, people want to write 
code that supports their particular 
virtualization implementation within 
the host system. The result can be 
some duplicated features, and 
potentially an approach that favors 
the person's own preferred virtual¬ 
ization system over the generic 
services the kernel is supposed to 
provide. That seems to have happened 
recently, when VMware submitted 
some more work on its balloon 
driver. A balloon driver allows 
memory allocation to fluctuate, 
so the virtual system can release 
memory back to the host system, 
and then claim more memory later, 
at need. It's a polite way to be a 


virtualization system. 

In this case, however, Andrew 
Morton pointed out that an even 
more polite approach would be to 
extend the memory handling abilities 
the kernel already possesses. The 
code to handle system hibernation 
seemed to him like a natural starting 
point for that approach. The only 
drawback is that none of the virtual 
system developers had considered 
that possibility, so it would mean 
backtracking their plans. But, it 
seems like either that, or some 
similar extension of existing func¬ 
tionality, will be the new direction, 
at least for balloon drivers. 

One of the most interesting 
aspects of kernel development is 
the balance struck between letting 
people contribute in the best way 
they can and keeping a rein on the 
messiness that can creep into a 
project when a lot of people are all 
pounding on it together. Recently, 
Linus Torvalds caught Phillip 
Lougher copying some ugliness 
from include/linux/mm.h into other 
include files needed for SquashFS. 
Phillip knew there was a problem 
with that ugliness, but he'd cleaned 
it up as much as he could, and any 
further effort would involve a major 
detour from his SquashFS work. 

And anyway, the mess already was 
in the code, so it didn't seem like 
such a high priority to him. 

But, Linus adamantly refused to 
let the ugliness propagate further 
into the code. He wasn't blaming 
Phillip for it, but he asked Phillip to 
work on cleaning it up more, and 
asked H. Peter Anvin to get into 
it with him. The end result was a 
delay in accepting the SquashFS 
changes and a bit of a detour for 
Phillip, but the work was at least 
relevant to what Phillip wanted to 
do, and it was going to have a fairly 
large impact on the cleanliness of 
this part of the kernel. 

— ZACK BROWN 


Tablet computers aren't new, 
just like tiny-form-factor com¬ 
puters weren't new. Much like 
with the Netbook craze, the 
new tablet computing craze 
has much to do with money 
and less to do with innova¬ 
tion. Don't get me wrong. I 
think we'll see tons of innova¬ 
tion, but it will be driven by 
consumers' pocketbooks (and 
their 
willing¬ 
ness to 
open 
them) as 
opposed 
to some 
amazing 

concept in computer design. 

I certainly thought 
Netbooks were the perfect 
place for Linux to gain a 
stronghold. Sadly, poor 
implementation by vendors 
and lack of a standard desk¬ 
top caused Linux to be the 
ugly alternative to Windows— 
something that should make 
us all shudder. Maybe, just 
maybe, tablets will be our 
second chance. 

Certainly, Apple's iPad 
has made a huge jump start 
into the hearts of spendy 
Americans. This time around, 
however, the Linux community 
has something we didn't have 
before: Android. Love it or 
hate it, Google has managed 
to provide a rather standard 
platform that is designed to 
work on mobile devices— 
and tablets! 

So, to all my fellow 
Netbook owners who bought 
Netbooks just because they 
looked cool, let's buy 
tablets! I don't really care 
which one, but please, buy 
one that runs Linux. 

— SHAWN POWERS 
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NON-LINUX FOSS 


Linux continues to make headway in embedded devices, but for many devices, it's just too 
heavy, and out of the box, it doesn't have real-time support. 

NuttX is a Real Time Operating System (RTOS) 
for small- to moderate-size embedded systems. It 
strives to be standards-compliant (POSIX and ANSI) 
to the fullest extent possible for a deeply embedded 
environment. NuttX is fully preemptible and includes a 
filesystem, C library, networking and USB device support. 

NuttX has been ported to numerous 
platforms/architectures ranging from small 8-bit 
systems, such as the 8052 and the M68HC12, to 
larger 32-bit systems, such as the ARM Cortex-M3. 
NuttX can be built with Linux and with Cygwin. Depending on the options that are 
enabled, NuttX can be squished down to around 20K or so. Around 50K gives you 
room for a full-featured build. 

NuttX was first released in 2007 and is actively developed. It has had 49 releases since 
then and currently is at version 5.2. NuttX is hosted on SourceForge at nuttx.sourceforge.net 
and is licensed under a BSD license. 

— MITCH FRAZIER 



# apt-get moo 
(_) 
(oo) 

/ - \/ 

/ I II 

* /\---/\ 


...."Have you mooed today?"... 

# i 


Repo, Man 

Sure, it was a cheesy 1980s movie, but more important, I'd like to focus on the 
"Repo" part of it. As Linux users, software repositories are second nature to us. For 
new users, however, that's not the case. 

Take my father for instance. (Hi Dad!) He recently started using Linux on his desktop 
machine, and once he was comfortable with the base install, he wanted to try some other 
applications. As a longtime Windows user, he called me to ask where a person goes to 
download software, specifically Amarok. The concept of preloaded software repositories 
was foreign to him, but I hope a rather exciting one once I explained it. 

We often tout security, stability and freedom when we talk about why Linux is so 
great. It's funny that the little things we take for granted, things like "convenience", 
already are built in to our favorite operating system. I rambled on-line about this as 
well, and because my space here is limited, feel free to add your two cents on our 
Web site: www.linuxjournal.com/content/linux-where-crapware-goes-die. 
apt-get install a_great_day! 

— SHAWN POWERS 


1. Number of machines registered at The Linux 
Counter: 136,986 


2. Percent of Linux Counter machines running 2.0 
kernels: 6.3 

3. Percent of Linux Counter machines running 2.2 
kernels: 6.8 

4. Percent of Linux Counter machines running 2.4 
kernels: 6.5 

5. Percent of Linux Counter machines running 2.6 
kernels: 92.2 

6. Average uptime for machines registered at The 
Linux Counter (in days): 70.3 

7. Longest uptime of a machine registered at The 
Linux Counter (in days): 1,856.4 

8. Number of Distro Watch-listed distros whose 
names end in the letters "ix”: 21 

9. Number of Distro Watch-listed distros whose 
names contain the acronym "OS": 14 

10. Number of Distro Watch-listed distros whose 
names contain the acronym "BSD”: 9 

11. Number of Distro Watch-listed distros whose 
names contain the letters "buntu”: 10 

12. Number of Distro Watch-listed distros whose 
names contain accented characters: 3 

13. Number of Distro Watch-listed distros whose 
names contain digits (0-9): 8 

14. Number of Distro Watch-listed distros whose 
names contain exactly one digit (0-9): 3 

15. Number of Distro Watch-listed distros whose 
names begin with a digit (0-9): 2 

16. Number of Distro Watch-listed distros whose 
names begin with the letter ’’Q”: 1 

17. Number of Distro Watch-listed distros whose 
names begin with the letter "S”: 30 

18. Number of digits that don’t appear in any Distro 
Watch-listed distro name: 5 

19. Number of letters that don’t appear as the first 
letter in any Distro Watch-listed distro name: 0 

20. Ranking of stallman.org in a Google search from 
’’RMS’: 4 


Sources: 1-7: counter.li.org I 8-19: Bistro Watch 
+ grep. etc. I 20: Boogie 
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[UPFRONT] 


Wi-Fi on the Command Line 


More people than ever are using wireless networks as their primary net¬ 
working medium. Great programs are available under X11 that give users 
a graphical interface to their wireless cards. Both GNOME and KDE include 
network management utilities, and a desktop-environment-agnostic utility 
called wicd also offers great functionality. But, what if you aren't running 
X11 and want to manage your wireless card? I don't cover how to install 
and activate your card here (for that, take a look at projects like madwifi 
or ndiswrapper). I assume your card is installed and configured properly, 
and that it is called wlanO. Most of the utilities mentioned below need 
to talk directly to your wireless card (or at least the card driver), so they 
need to be run with root privileges (just remember to use sudo). 

The first step is to see what wireless networks are available in 
your area. A utility called iwiist provides all sorts of information 
about your wireless environment. To scan your environment for 
available networks, do the following: 

sudo iwiist wlan0 scan 

You'll see output resembling: 

Cell 01 - Address: 00:11:22:33:44:55 
ESSID:"network-es$id" 

Mode:Master 
Channel:11 

Frequency:2.462 GHz (Channel 11) 

Quality=100/100 Signal level:-47dBm Noise level=-100dBm 
Encryption key:off 


The details (address and essid) have been changed to protect the 
guilty. Also, the ... represents extra output that may or may not be 
available, depending on your hardware. You will get a separate cell 
entry for each access point within your wireless card's range. For each 
access point, you can find the hardware address, the essid and the 
channel on which it's operating. Also, you can learn in what mode the 
access point is operating (whether master or ad hoc). Usually, you will 
be most interested in the essid and what encryption is being used. 

Once you know what's available in your immediate environment, 
configure your wireless card to use one of these access points using 
the iwconfig utility to set the parameters for your wireless card. First, 
set the essid, which identifies the network access point you want: 

sudo iwconfig wlan0 essid network-essid 

Depending on your card and its driver, you may have the option 
to set the essid to the special value "any". In this case, your card will 
pick the first available access point. This is called promiscuous mode. 

You also may need to set the mode to be used by your wireless card. 
This depends on your network topology. You may have a central access 
point to which all of the other devices connect, or you may have an ad hoc 
wireless network, where all of the devices communicate as peers. You may 
want to have your computer act as an access point. If so, you can set the 
mode to master using iwconfig. Or, you simply may want to sniff what's 
happening around you. You can do so by setting the mode to monitor and 
passively monitor all packets on the frequency to which your card is set. 


You can set the frequency, or channel, by running: 
sudo iwconfig wlan0 freq 2.422G 
Or by running: 

sudo iwconfig wlan0 channel 3 

You can set other parameters, but you should consider doing 
so only if you have a really good reason. One option is the sensitivity 
threshold, which defines how sensitive the card is to noise and signal 
strength, and you can set the behavior of the retry mechanism 
for the wireless card. You may need to play with this in very noisy 
environments. Set the maximum number of retries with: 

sudo iwconfig wlan0 retry 16 

Or, set the maximum lifetime to keep retrying to 300 milliseconds with: 

sudo iwconfig wlan0 retry lifetime 300m 

In a very noisy environment, you also may need to play with packet 
fragmentation. If entire packets can't make it from point to point 
without corruption, your wireless card may have to break down 
packets into smaller chunks to avoid this. You can tell the card 
what to use as a maximum fragment size with: 

sudo iwconfig wlan0 frag 512 

This value can be anything less than the size of a packet. Some cards 
may not apply these settings changes immediately. In that case, run this 
command to flush all pending changes to the card and apply them: 

sudo iwconfig wlan0 commit 

Two other useful commands are iwspy and iwpriv. If your card 
supports it, you can collect wireless statistics by using: 

sudo iwspy wlan0 

The second command gives you access to optional parameters for your 
particular card, iwconfig is used for the generic options available. If you 
run it without any parameters (sudo iwpriv wlanO), it lists all available 
options for the card. If no extra options exist, you will see output like this: 

wlan0 no private ioctls 

To set one of these private options, run: 

sudo iwpriv wlan0 private-command [private parameters] 

Now that your card is configured and connected to the wireless 
network, you need to configure your networking options to use it. If you 
are using DHCP on the network, you simply can run dhcli ent to query 
the DHCP server and get your IP address and other network settings. 
If you want to set these options manually, use the ifconfig command 
(see the man page for more information). —joey Bernard 
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Dear Canonical, I Can Haz 
Ubuntu One Source? 


I love Dropbox (www.dropbox.com) 

and use it on all my computers, so 
I was excited to see Canonical do 
something similar with its Ubuntu 
One program. That excitement was 
quick to dwindle, 
however, when 
I realized that 
although the client 
software is com¬ 
pletely open 
source, the server 
bits are not. Those of us running a 
huge network of computers can't 
set up our own Ubuntu One server 
internally, and we can't hope for the 
community to add support for other 
operating systems. 

Ultimately, I wish Dropbox would 
become open source. That would not 
only give us cross-platform support, 
but also remove the "Ubuntu" slant 
that Canonical's product currently 


sports. My suspicion, however, is 
that sooner or later, Google will 
realize Dropbox is the Gdrive it has 
never had—and buy it. Although 
that would be really cool, and more 
people would 
adopt the already- 
amazing Dropbox, 
it also would 
mean no chance 
of Dropbox 
coming to a Linux 
server any time soon. 

So again. Canonical, please open 
the source to Ubuntu One. You still 
can offer your cloud solution, but 
we also can make our own little 
clouds in-house. Who knows, maybe 
the community will add features 
and functionality that you'll want 
to adopt yourself. We'll be happy 
to share back! 

— SHAWN POWERS 


| * I: eta 

ubuntu 

one 


LinuxJournal.com 

This month. I'd like to take the time to acknowledge the many people 
involved with Drupal, the open-source Web project that powers 
LinuxJournal.com. Although most of the core project is the work of 
a small group of developers whose thousands of contributions are 
the driving force behind Drupal's evolution, there are thousands 
of people contributing code and modules to the community. These 
contributions are what allow me to produce LinuxJournal.com 
without a large Web team, and I am extremely grateful for them. 
Although I frequently cite Drupal's flexibility and power as the 
reason I am able to remain a one-person team, this is absurdly 
misleading. I am a one-person team with a support system of 
thousands, and I also can credit Addison Berry of Lullabot with 
getting me through many tough spots throughout the past year. 

I believe Drupal is the best Web platform around, but I also 
believe that is a result of having the best Web community around. 

I am thrilled to be a small part of it. 

With sites like WhiteHouse.gov adopting Drupal, this is an exciting 
time to be a Drupaler, and I encourage others to use it and get 
involved. Thank the community that makes your own work better, 
and don't forget to make your own contributions. 

— KATHERINE DRUCKMAN 


They Said It 


I think the biggest mistake most 
people make when they pick their 
first job is they don't worry enough 
about whether they'll love the work, 
and they worry more about whether 
it's a good experience. 

—Steve Ballmer 

Save early, save often. 

—Alwin Lee and everyone else 
who uses Microsoft Word 

From then on, when anything went 
wrong with a computer, we said it 
had bugs in it. 

—Rear Admiral Grace Murray 
Hopper, US Navy 

AOL is like the cockroaches left 
after the nuclear bomb hits. They 
know how to survive. 

—Jan Horsfall, VP of marketing 
for Lycos 

The Linux philosophy is "Laugh in 
the face of danger." Oops. Wrong 
One. "Do it yourself." Yes, that's it. 

—Linus Torvalds 

If Gore invented the Internet, 

I invented spell-check. 

—Former Vice President J. 
Danforth Quayle 

What is the difference between 
apathy and ignorance? I don't 
know and I don't care. 

—World Entertainment War 

My problem with Linux is, that it 
makes it very difficult to handle porn. 

—Kshitij Sobti 

(Posted on thinkdigit.com on 

April Fools' Day 2010) 
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CouchDB 

Getting started with CouchDB, an increasingly popular non-relational 
database. 


The surge in interest in non-relational databases— 
often known collectively as NoSQL—is now impossible 
for Web developers to ignore. Whether you are 
looking at a non-relational database for reasons 
of scalability, availability, cost, performance or just 
because it's a shiny new toy, there's no denying 
that serious Web developers need to consider 
non-relational options when designing an appli¬ 
cation. In the past few months, every project 
on which I've worked has at least considered 
a non-relational solution, even when the final 
decision was to use a relational database. 

In the previous two installments of this column, 

I looked at MongoDB, an object (or "document") 
database with a somewhat relational feel. 
MongoDB stores objects, but its query language 
should look somewhat familiar to those of you who 
have long used relational databases. If you're willing 
to consider a more radical departure from the world 
of relational databases and query syntax, instead 
of using the map-reduce paradigm, easy replication 
and a straightforward RESTful API, you might 
want to consider CouchDB, now part of the Apache 
Software Foundation. Even if you don't use CouchDB 
in production environments, you may find (as I 
do) that its use of JavaScript, coupled with its 
implementation of map-reduce, helps you think 
in new and different ways about old problems. 

CouchDB Basics 

Downloading and installing CouchDB is extremely 
easy. If it's not available via a simple apt-get 
install (or the yum equivalent) for your system, or 
if you simply prefer to install a source version, you 
can download it from the CouchDB home page at 
couchdb.apache.org. The version I'm running is 
slightly out of date (0.10), rather than the latest 
production version at the time of this writing (0.11). 
Nevertheless, the differences aren't that great, 
especially for the simple examples shown here. 

After I installed CouchDB with apt-get, I started it 
with the following standard command on my server: 

/etc/init.d/couchdb start 

That starts the CouchDB server on port 5984. By 
default, this means I can access the CouchDB server as: 

http://127.0.0.1:5984/ 


If you are interested in accessing your CouchDB 
server from another system, you can modify the 
CouchDB configuration file (/etc/couchdb/default.ini 
on my machine) by going to the "httpd" section 
and replacing the name-value pair: 

bind_address = 127.0.0.1 

with your IP address instead of 127.0.0.1 (that is, 
localhost). Restart CouchDB, and it will be acces¬ 
sible not only to local HTTP clients, but also across 
the Internet. 

Obviously, starting a CouchDB server on its well- 
known port and without any security restrictions is 
asking for trouble. If you are running a production 
instance of CouchDB, you should ensure that it cannot 
be accessed or modified by the general public. 
CouchDB comes with basic authentication options 
that make it possible to restrict access to databases, 
and you should look into those before deploying 
your system to a public server. 

If you point your Web browser to your CouchDB 
server at port 5984, you will see the following: 

{"couchdb":"Welcome","version":"0.10.0"} 

This response tells you several things. First, you 
see that all communication in CouchDB takes place 
using JSON, the JavaScript object notation that has 
become a lightweight method for communication 
among Internet applications. Although CouchDB 
is written in Erlang, an open-source language 
designed for distributed processing, nearly every¬ 
thing associated with it uses JavaScript. Functions 
(as you soon will see) are written in JavaScript, and 
both inputs and outputs are sent using JSON. 

CouchDB is also RESTful—it uses the entire 
vocabulary of HTTP verbs to describe what should 
happen and a URL to indicate the object on which 
the action should take place. Most people are 
familiar with HTTP's GET and POST verbs, but less 
so with PUT and DELETE. CouchDB uses all of these, 
combining HTTP, JSON and REST for rich effect. 

Thus, when you point your Web browser to 
your CouchDB server on port 5984, asking for the 
document/, you actually are issuing a GET request 
for the document named /. CouchDB's response 
describes the server, rather than an individual 
document. The response is an object (equivalent 
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to a "hash" or "dictionary" in languages such as 
Perl, Ruby or Python) with two keys. The first, 
"couchdb", simply says "Welcome". The second, 
named "version", tells you the version of the 
server that is running—in this case, 0.10.0. 

Let's change the URL somewhat, going instead 
to the URL /_utils. If you go to that document, 
you'll see a much more interesting response. 
Indeed, rather than receiving JSON, you will get 
a full-fledged Web page, with a CouchDB logo 
in the top right. This is Futon, the CouchDB 
Web-based interface. It is sometimes called the 
administrative interface, but it is also quite useful 
for experimenting with the database. 

Along the right side of the main Futon page is 
the main "tools" menu. It normally comes up in the 
overview mode, but you can switch to a number of 
other screens by clicking on them. Most interesting 
to me is the test suite, which provides a Web-based 
interface to ensure that your CouchDB installation 
is working correctly. Although it is unlikely that 
your system has any problems, you still might want 
to run the test suite, just for personal satisfaction 
and thoroughness. 

Creating and Populating a Database 

Going back to the overview screen, you should see 
a prompt at the top saying create database. Just 
as with most relational database systems, a single 
server may contain more than one database. 

Each database then may contain any number of 
documents, each of which has a unique ID and 
any number of name-value pairs. 

So to get started, you need to create a new 
database. Click on the link, and an AJAX dialog box 
opens up, asking for the name of the database. 

I'm going to assume a database name of "atf" for 
this column, although you might want to choose 
something closer to your own name or interests. 
You may use any alphanumeric characters (plus 
some symbols) for a database name, keeping in 
mind that a leading underscore is used by internal 
CouchDB systems, meaning that you should avoid 
such names for your own work. 

After you create a database, you'll be brought 
to the browse database page. Click on the new 
document button to create a new document. 
CouchDB automatically gives the new document a 
unique ID value (key name "Jd"). You may change 
the ID to one of your liking, if you have a unique 
numbering or naming scheme that you prefer. 

Then, you may add as many name-value pairs 
as you like, by clicking on the add field button. The 
name is assumed to be a string, but the value may 
be any legitimate JSON value—a number, string, 
array or object. If you enter an array (within square 
brackets) into the interactive Futon interface, upon 


completion, it will be represented visually as an 
array. The same is true with a JSON object. After 
you enter it, the name-value pairs are displayed 
in an easy-to-read format. 

Once you have added some fields to your 
document, click the save button. 

I added a number of fields to a document 
describing me. The fields tab in Futon shows me 
these values in a nice, easy-to-edit format. If I want 
to see the document in its native JSON, I can click 
on the source tab and see it there: 


"_id": "0534ca63b70beb02d24b62ec4fe72566", 
"_rev": "4-bea8364f4536833clfd7de5781ea8a08", 
"f i rst_name": "Reuven", 

''last_name": "Lerner", 

"children": [ 

"Atara", 

"Shikma", 


Notice that in addition to the fields I already 
have mentioned, there is a "_rev" field. That's 
because when you save a document, the old version 
does not disappear. Rather, CouchDB keeps the old 
one around, much as a garbage collector handles 
memory in high-level languages, such as Ruby and 
Python. This means there can be multiple documents 
with the same "Jd" field, although only one is 
considered current—the one with the latest "_rev" 
field value. The revision contains an integer as well 
as an MD5 hash value. You normally can look at 
only the integer to identify the revision, ignoring 
the hex portion of the string. 

Do not mistake the revision tag as a means 
of keeping backups or for version control. The 
moment someone compacts a database, all of 
the old revisions are removed. 

As with other non-relational databases, 
CouchDB allows you to add, remove and rename 
fields whenever you like. Each document in a 
database might have its own unique field names, 
although in practice, this is fairly rare. It is far 
more common for each document to have a 
common set of fields, perhaps with some variation 
in special cases. It is common to say that 
CouchDB is "schemaless", but I think it's safer 
to say that CouchDB (and other NoSQL storage 
facilities) allows the programmer to decide on 
the schema at runtime, rather than in advance— 
much as a dynamic programming language 
allows you to determine the type of a variable 
at runtime, rather than at compile time. 

One thing that obviously is missing from a 
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JSON-based database is the notion of a foreign 
key—a pointer from one document, or record, 
to another. There is no built-in facility for linking 
one document to another, although there certainly 
are ways to use information in one document to 
view another document. 

Outside Futon 

It's very nice that CouchDB comes with an easy-to- 
use, browser-based interface. However, this interface 
is clearly not what you want to be using from your 
applications. As I wrote above, CouchDB communicates 
with the outside world using JSON over HTTP. Any 
action that you just performed via the browser also 
should be possible via an HTTP client. You could 
use a library for a programming language; every 
major language has at least one CouchDB client. 
But a popular and easy-to-use option is the curl 
command-line program. 

To send a simple GET request to my CouchDB 
server, I can write: 

curl http://atf.lerner.co.il:5984/ 

And sure enough, I receive the same response 
as before: 


{"couchdb":"Welcome","version":"0.10.0") 

* Connection #0 to host atf.lerner.co.il left Intact 


You might notice that the "Content-type" 
response header indicates that what the server 
sends back is in text/plain format. So, although 
you might see the content as JSON, CouchDB 
itself indicates that it's sending plain text. This 
isn't a big deal, unless you are writing a program 
that specifically waits for JSON, so you might 
need to modify its expectations a bit. 

You can request your Futon URL as well, using 
HEAD to avoid the long response: 




* Connected to atf.leraer.co.il (69.55.225.93) port 5984 (#0) 

> HEAD /utils/ HTTP/1.1 

> User-Agent: curl/7.19.4 (universal-apple-darwinl0.0) libcurl/7.19.4 

> OpenSSL/0.9.81 zlib/1.2.3 

> Host: atf.lerner.co.il:5984 

> Accept: */* 


{"couchdb 11 : "Welcome", "version": "0.10.0"} 

Unfortunately, if something goes wrong, curl 
won't say much. For that reason, I generally prefer 
to use the -v option to curl (and most other programs, 
for that matter), which shows me the HTTP request 
and response as they take place. It also comes in 
handy to specify the HTTP verb you want to use 
(GET, in this case), so I'll do that with the -X option. 
Thus, I can write: 

curl -vX GET http://atf.lerner.co.i1:5984/ 
And I see: 

* About to connecto to atf.lerner.co.il port 5984 (#0) 

* Trying 69.55.225.93... connected 

* Connected to atf.lerner.co.il (69.55.225.93) port 5984 (#0) 

> GET / HTTP/1.1 

> User-Agent: curl/7.19.4 (universal-apple-darwinl0,0) libcurl/7.19.4 

> OpenSSL/0.9.81 zlib/1.2.3 

> Host: atf.lerner.co.11:5984 



< Server: CouchDB/0.10.0 (Erlang 0TP/R13B) 



< Content-Type: text/html 

< Content-Length: 3158 


In this case, you get a text/HTML response. And, 
of course, you know that Futon sends HTML for its 
response, because you already have been using it 
from a Web browser. 

Now, let's try to look at the atf database, 
which I created earlier, that contains a single 
document (that is, record). How can I retrieve 
that information? 

Well, I can start by asking for the database 
(leaving off the -v option now for space reasons): 

~S curl -X GET http://atf.lerner.co.1l:5984/atf 

{"db_name": "atf", "doc_count" :1, "doc_del_count" :0, "update_seq" :4, 
"purge_seq":0."compact_running":false,"disk_size":16473, 
"instance_start_time":"1271067859057749","disk_format_version":4) 

In other words, asking for a database gives basic 
information about that database, from the number 
of documents to the amount of space it consumes 
on the disk. 

You can retrieve an individual document by 
using its ID: 
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~$ curl -X GET 

I— http://atf.lerner.eo.it:5984/atf/0534ca63b70beb02d24b62ec4fe72566 

{"Jd" : "0534ca63b70beb02d24b62ec4fe72566", 

"_rev" : "4-bea8364f4536833clfd7de5781ea8a08", 

"chiIdren": ["Atara", "Shikina" , "Aiiiotz”]} 

If I want to modify one or more fields in this 
document, or even add another field, I can do so 
with a PUT command, curl's -d option lets me 
specify a document on the command line: 

~$ curl -X PUT 

whttp: //atf.lerner.co.i1 : 5984/atf70534ca63b70beb02d24b62ec4fe72566 


Well, this is surprising. CouchDB is complaining that 
it cannot perform the update I need, because there is a 
conflict. Notice that it does not report the error using 
HTTP codes (such as 500), but rather by sending a 
JSON object back to me, containing the "error" key. 

The reason CouchDB gives an error message 
here is that I haven't indicated which revision I 
am attempting to update. Without such a revision 
indicator, CouchDB assumes I have stale data and, 
thus, will not allow me to update the document. 
Only if I send my update with the current "_rev" 
value will the update succeed. For example: 

~$ curl -X PUT 

*http: //atf.lerner.co.il:5984/atf/0534ca63b70beb02d24b62ec4fe72566 
-d T_rev H : "4-bea8364f4536833clfd7de5781ea8a08\ 

M first_name": "Superman", "middle_initial": "M." }’ 

CouchDB responds with: 

{"ok":true,"id" : ' , 0534ca63b70beb02d24b62ec4fe72566“,"rev" : 
*"5-fe6fccb89b9512d26120fbd63dbbl5c4"} 

In other words, the update succeeded, incre¬ 
menting the revision. If you try the same update 
again, you will get the same "update conflict" 
error message as before, because there can be 
only one update to a given revision. 

Note that when you PUT an update to a document, 
you must update the entire document at once. Unlike 
the UPDATE command in a relational database, adding 
a new revision to a CouchDB document does not 
modify individual fields. Rather, it stores an entirely 
new document with the same ID and an incremented 
revision number. This means in this example, it's true 
that I have added the "middlejnitial" field successfully. 
However, I also have effectively removed the "children" 


field, because I did not specify it in my PUT statement. 

You can add an entirely new document to your 
database using the POST verb in HTTP. For example: 

curl -X POST http://atf.lerner.co.ll:5984/atf 

Sure enough, I get the following response, 
indicating that a new document was created: 

{"ok" : true,"id":"aeb6925eb23278flb8e530ba67b0172d", 
"rev" : "l-f0e336978a368f679ee7b280107bc2fb"} 

I should add that I had a terrible time trying to use 
curl to create a document, all because of the quotes. It 
seems that you must use double quotes inside a JSON 
request (around the names of the keys and values). 
Single quotes result in a strange error message indicat¬ 
ing that the UTF-8 encoding for JSON is invalid, which 
did not quite point me in the right direction. 

Conclusion 

CouchDB is an increasingly popular non-relational 
database, offering a great deal of flexibility in storage 
and retrieval. This month, I explained how to create 
databases in CouchDB and do basic storage and 
retrieval using both the Web-based Futon interface 
and curl. Next month, I will demonstrate writing 
JavaScript functions that process and display the 
data, demonstrating the true power of CouchDB.* 


Reuven M. Lerner is a longtime Web developer, architect and trainer. He is a PhD 
candidate in learning sciences at Northwestern University, researching the 
design and analysis of collaborative on-line communities. Reuven lives with 
his wife and three children in Modi’in. Israel. 


Resources 


The home page for CouchDB is at the Apache Project 
(couchdb.apache.org). There, you can not only download 
the software, but also read documentation, from tutorials 
to an active wiki. The CouchDB Web site also has links to 
drivers for the various languages you're likely to use when 
working with CouchDB. 

If you're interested in the JSON format used by CouchDB, you 
can learn more about it at the main Web site: json.org. 

Finally, two good books on CouchDB were released in the 
past few months. Beginning CouchDB by Joe Lennon and 
published by Apress is aimed more at beginners, but it has 
a solid introduction to CouchDB, Futon and how you might 
use the system. CouchDB: The Definitive Guide by J. Chris 
Anderson, Jan Lehnardt and Noah Slater, published by O'Reilly, 
is a bit more advanced and meaty, but it might not be 
appropriate for beginning users of non-relational databases. 
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DAVE TAYLOR 


Simple Scripts to 
Sophisticated HTML 
Forms, Take II 

Parsing HTML files. 


We've been digging into the Yahoo Movies 
database for the past few months, as you'll recall, 
building a command called findmovie that will 
have the following usage: 


USAGE: findmovie -g genre -k keywords -nrst title 


in fact, there are better solutions in modern 
Linux systems. Check out mktemp if you're on 
a BSD-based system. If that's not available, use 
man smartly: man -k temp | grep 1 (1 1 will 
extract the replacement that your distro has 
instead. Here's a typical use of mktemp: 


However, we slammed into a wall at lOOkph last 
month in the simplest of calculations: how many 
titles match a given combination of query elements? 

For example, how many action films are there that 
have "death" in the title? That'd look like f i ndmovi e 
-g act death, but making that count actually work is 
tricky, because the Yahoo Movies database output 
is different depending on whether there are zero 
matches, less than a page of matches or more than a 
page of matches. Examples of each output are "Sorry, 
no matches were found", "(All results shown)" and 

However, we slammed into a wall at 
lOOkph last month in the simplest of 
calculations: how many titles match a 
given combination of query elements? 

"< Prev 11 - 20 of 143 | Next 20 >", respectively. 

Oh, and it gets worse. Sometimes when there's 
less than a full page of results, you'll see something 
like this: "< Prev | 1 - 3 of 3 | Next >" instead. 

It's pretty much a huge pain in the booty, and 
even if you crack open the source, there's no handy 
spot that says "0" or "4" or "143". So, that's what 
I want to focus on this month—parsing an HTML 
file to isolate and identify this particular data point. 

Caching the Results 

The first observation I have about identifying a 
solution is that we are going to need to cache 
(or save) the results, so we can parse it more 
than once to see what we find. This brings up 
the old shell scripting challenge of choosing a 
good, unique, temporary filename. 

I'm old-school. I'm used to using . $$ to use 
the process ID as the basis of the temp file, but 


appname=$(basename $0) 

TMPFILE=$(mktemp /tmp/${appname}.XXXXXX) || exit 1 

It looks pretty similar, but by using that many X 
characters, the program uses the PID and random 
letters, making the temp file impossible for a hacker 
to guess or anticipate. The version of this script I've 
been developing on my Mac OS X system had the 
following code snippet: 

if [ $dump -eq 1 ] ; then 

exec /usr/bin/curl --silent "$baseurl${params}\&p=$pattern" 
else 

exec open -a safari "$baseurl${params}\&p=$pattern'' 


The problem here is that using exec to invoke a 
command replaces the shell script with the command 
in question, which isn't going to work. Instead, it's 
time to rewrite it: 

if [ Sdump -eq 1 ] ; then 
appname=$(basename $0) 

TMPFILE=$(mktemp /tmp/${appname}.XXXXXX) || exit 1 
/usr/bin/curl --silent ”$baseurl${params}\&p=$pattern'' \ 
> STMPFILE 

else 

exec open -a safari "$baseurl${params}\&p=$pattern" 
fi 

That looks good. If we're dumping the file 
source, it'll go to the temporary file for later 
analysis. If it's a request that is supposed to launch 
the search results in a browser, it still uses the 
Mac OS X open command. 

Parsing the Results 

To figure out what's going on, we need to account 
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for three different possibilities, each of which 
has a different "fingerprint" in the source file. 
Here's a rough template: 

if [ ! -z "$(grep -i "no matches were found" JTMPFILE)" ] 

echo there are zero results for that search, 
elif [ ! -z "J(grep -i "Next&nbsp;&gt;" STMPFILE)" ] 

echo got some results with case two. 
else 

echo more than a page of results 
ft 


Here, I'm showing only output echo statements 
to give you a sense of the algorithm, but you can 
see that we're just testing for a known string that 
hopefully won't show up in other situations. Note 
the third test, though: Next&nbsp;&gt; is some 
HTML weirdness, "nbsp" is a non-breaking space, 
and "gt" is the > symbol. Wrap 'em in and 
and you have HTML character entities. 

To ascertain the total match count requires yet 
more parsing of the output. Search for "death 
race", and you'll find three matches, which end 
up looking like this: 


the data entry or confusing the new parser. This 
means we can create the following code: 

if [ ! -z "$(grep -i "no matches were found" JTMPFILE)" ] 
then 

matches=0 

elif [ ! -z "J(grep -i ”Next&nbsp;&gt;" JTMPFILE)* ] 
then 

matches="J(grep -i "1 - " JTMPFILE | head -1 | \ 
sed ' s/<b>/~/g;s/<\/b>/~/g' | cut -d\~ -f4)" 


matches="J(grep -i "1 - " JTMPFILE | head -1 | \ 
sed 's/<b>/~/g;s/<\/b>/~/g' | cut -d\~ -f4)" 
fi 


You can see how I'm differentiating the three 
cases and how the resultant code is fairly similar in 
the second and third cases. In fact, they don't need 
to be separate cases, so the count is more easily 
calculated like this: 

if [ ! -z "$(grep -i "no matches were found" JTMPFILE)" ] 
matches=0 


matches="J(grep -i "1 - " JTMPFILE | head -1 | \ 
sed 1 s/<b>/~/g;s/<\/b>/~/g 1 | cut -d\~ -f4)" 
fi 


Unfortunately, it's rather buried in a more com¬ 
plicated pattern, because here's a typical match: 

<td align=right><font face=arial size="-2"xnobr> 
*»&lt;&nbsp; Prev&nbsp;|&nbsp;<b>l - 3</b> 
v»&nbsp;of&nbsp;<b>3</b>&nbsp; ■ ■ ■ 

I have to admit, I was stumped for a bit, which 
is why having geeky friends like Martin and Lucretia 
M. Pruitt is so darn helpful. I posed this puzzle on 
Twitter (I'm @DaveTaylor if you want to follow me), 
and after some false starts, they suggested a simple 
and logical solution: turn the <b> and </b> into 
individual character delimiters, then simply use cut 
to pull out the field we seek. Smart! 

Here's how that looks as a simple command 
sequence: 

grep -i "1 - " JTMPFILE | 

sed ’s/<b>/~/g;s/<\/b>/~/g’ | 
cut -d\~ -f4 

Armed with this, the ugly HTML sequence above 
quickly reduces down to the value 3, which is exactly 
what we want. One nuance, though. It turns out 
that this data appears both before and after the 
matches, so we need to slip | head -1 to ensure 
that we're parsing only one line and not duplicating 


If you initialized matches to zero, you actually 
can flip the logic of the first conditional and prune 
it down even further: 

matches=0 

if [ -z "Kgrep -i "no matches were found" JTMPFILE)" ] 

matches-’J(grep -i "1 - " JTMPFILE | head -1 | \ 
sed ’s/<b>/~/g;s/<\/b>/~/g’ | cut -d\—f4)” 


Nice. It's a simple, straightforward and fine example 
of how if you keep thinking about what you're really 
accomplishing with complex conditionals, they often 
can be not only simplified, but sped up too. 

Next Month 

While writing these columns on working with Yahoo 
Movies, I've found my interest has been pulled in a 
different direction: a "name that tune" game. That's 
what we'll start working on next month. If you want 
to get a sneak peek at it and see how it evolves in real 
time (rather than here in Linux Journal), jump on 
Twitter and follow @SongTitle. It's going to be funla 


Dave Taylor has been hacking shell scripts for a really long time. 30 years. He's 
the author of the popular Wicked Cool Shell Scripts, and he can be found on 
Twitter as @DaveTaylor and more generally at www.DaveTaylorOnline.com. 
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KYLE RANKIN 


Lightning Hacks—SSH 
Strikes Back 

In this third Lightning Hacks roundup, check out how to automate 
screen connections, build reverse tunnels and use the elusive SSH 
command line. 


Every year or so, I like to write a column I title 
"Lightning Hacks". This column is inspired by the 
lightning talks common at most conferences. In a 
lightning talk, instead of having one speaker give 
a 60-minute presentation, multiple speakers give 
short 5-10-minute presentations. By the end of a 
lightning talk, you end up hearing about all sorts 
of cool topics that wouldn't have gotten their 
own time slot. In this column, I get a chance to 
talk about a few cool "hacks" I've run across that 
wouldn't fill an entire column by themselves. 

In prior Lightning Hacks columns, I've covered a 
number of different topics, but this time I've decided 
to focus on only one: SSH. Like many system 
administrators, I spend a great deal of my day 
within SSH sessions, and over the years, I've 
found a few shortcuts and handy tips that I save 
in shell scripts so I don't forget them. 

Automatically Load Screen-Like Sessions 

This first hack seems really simple—after all, I am 
adding only one extra flag to SSH. Normally, if you 
want to ssh into a machine and run a program, 
you simply pass the program at the end of your 
SSH command: 

$ ssh user@remotehost.example.org df 

Yet, if you ever have tried to write a shell 
script that would automatically ssh in to a 
remote machine and launch mutt or screen or 
similar programs, you have seen the session either 
sit there or exit with some message like "Must be 
connected to a terminal." I ran into this problem 
on my N900 palmtop when I wanted to launch 
two special terminal sessions: one that automatically 
reconnected to a remote screen session and another 
that loaded mutt. Yeah, that's right. I still prefer 
mutt and irssi, even on a palmtop. Neither worked 
though until I added the -t flag: 

$ ssh -t user@remotehost.example.org screen -dr 
$ ssh -t user@remotehost.example.org mutt 

The first example connects to the remote host 


and re-attaches my remote screen session (I run 
only a single screen session on my host and then 
use Ctrl-a c to create windows within that session). 
The second example simply runs mutt. The -t flag 
forces pseudo-tty allocation. It turns out that 
when you run programs like screen or mutt, you 
need to force SSH to create a pseudo-tty. 

Route around Bothersome Firewalls 

I know a million articles have been written about 
SSH tunneling, but this particular type of tunneling 
is so useful; however, I use it infrequently and 
forget the proper syntax. A problem you often 
may run into is needing to scp a large number 
of files between two servers (let's say londonwebl 
and seattlewebl), but for some reason, the two 
machines are firewalled off from each other. 
Usually, you have one server that is able to ssh 
into both machines (let's call that server adminl), 
and if it were just one or two files that needed 
to be transferred, you could copy the files first 
from londonwebl to adminl, then from adminl 
to seattlewebl. 

When you need to transfer multiple files (or 
perhaps pipe dd traffic) between the two sites, it 
can be impractical, if not impossible, to move 
data to an intermediary server first. That's where 
SSH reverse tunnels come in handy. With a reverse 
tunnel, you launch an SSH session from your 
intermediary server (adminl in this case) to the 
first server (londonwebl) and open up a local high 
port that is unused, such as 2222. Then, you tell 
SSH to tunnel all traffic on that port over to the 
remote server (seattlewebl). Once the tunnel is 
set up, you can use scp as you normally would, 
except you point it to localhost port 2222. 

To set up the tunnel, I would run the following 
command from adminl: 

kyle@adminl:~$ ssh -R 2222:seattlewebl:22 londonwebl 

The arguments to -R can be easy to mix up. Note 
that the last server in the command (londonwebl) is 
the server to which I log in. The first argument to -R is 
the port to open up on that server (2222). The next 
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two arguments list to which server and port to forward any traffic 
(seattlewebl and 22, respectively). 

Once I log in to londonwebl, I can use scp (or rsync) like 
I normally would, but I point it to localhost port 2222: 

kyle@londonwebl:~$ scp -r -P 2222 /var/www/mysite localhost:/var/www/ 

When I initiate this scp command, all the traffic enters the 
tunnel and goes to adminl, and then from there, it is forwarded 
to port 22 on seattlewebl. Keep in mind that this means if 
these machines are far apart, your bottleneck will be the slowest 
link between the servers. 

If you are a security-minded individual in charge of a network, 
you may not like how easy it is to route around your basic 
firewall rules. It's important to realize that reverse tunnels also 
can be used to connect from inside your network to a person's 
home machine, so even with incoming firewall rules set, a user 
still could tunnel in. 

Adding SSH Tunnels on the Fly 

A lesser-known feature of SSH is that you can enter an 
internal command-line mode in an existing session and add 
extra tunnels. Let's say you already have an SSH session 


open from adminl to londonwebl, and now you want to 
add the reverse tunnel without having to log out. First, 
press ~C (that's the ~ character and then a capital C) to 
open the SSH command line. Then, you can add extra 
port-forwarding commands as though they were part of 
the original SSH command line. When you are done, simply 
press Enter to return to the regular shell: 

kyle@londonwebl:~$ 

ssh> -R 2222:seattlewebl:22 

Forwarding port. 

kyle@londonwebl:~$ 

This also could be useful if you use regular SSH tunnels 
(the -L option) as a poor-man's VPN and realize that, for 
instance, you need to set up an extra VNC or RDP tunnel to a 
new server. When you use the SSH command line, you won't 
have to close and break any existing sessions you have.* 


Kyle Rankin is a Systems Architect in the San Francisco Bay Area and the author of a number 
of hooks, including The Official Ubuntu Server Book. Knoppix Hacks and Ubuntu Hacks. He is 
currently the president of the North Bay Linux Users’ Group. 
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Adventures in Scanning 

Has scanning under Linux improved? 


Like many geeks, I dream of a paperless office. I 
don't know when that phrase first came into use, 
but a quick scan around both my home and work 
office convinced me it's still a long way off. To add 
insult to injury, the fax machine as a means of business 
communication seems to be a zombie technology 
that refuses to die. All too often when I deal with 
businesses, they cheerfully tell me to fax something 
to a number they provide. That is all well and good, 
but because I made the switch to VoIP (Voice over 
IP), which does not support faxing, I am forced to 
make trips to the local Kinko's more than I would 
like to admit. Time has moved on, and a number of 
businesses now will accept a PDF file with scanned 
versions of the documents. This fills me with both 
joy and terror. I'm happy I don't have to find a landline 
with a fax machine, but I'm terrified of scanning 
under Linux. I haven't done it in a very long time 
(years), mostly because my experience was so bad 
and frustrating, I resolved to leave it as one of those 
things I cannot do under Linux. 

Finding Your Scanner 

You already may have a scanner lying around, but I 
didn't. I searched the Internet where all roads led 
to the SANE (Scanner Access Now Easy) Web site, 
so that's where I started. SANE is the main clearing 
house for information related to scanners and scan¬ 
ning. It has a big list of devices that shows how well 
they are supported, but the list is less helpful than it 
looks, primarily because it focuses on listing all the 
scanners that are known to work, including many 
that are no longer manufactured. I spent a lot of 
time trying to find a scanner that was both on the 
list and available from Amazon.com. I then found 
the best path was to go to the Ubuntu forums and 
search for recommended scanners. If you use another 
distribution, check its forum or assume that if it 
works for Ubuntu, it will work with any other modern 
distribution (although that's not always a safe 
bet). My main criteria were size (smaller the better), 
USB (is there another interface now?) and cheap 
(less than $100). 

After some searching, I found recommendations 
for the Epson v300 and the Epson v500. On 
Amazon.com, the v300 was available for $89, and 
the v500 was $165. I am sure the v500 is awesome, 
but given that I mainly wanted to use it for documents, 

I didn't think I needed to pay double. I was a little 
confused because the scanner is called a photo 


scanner, but the dimensions showed that it could 
scan a full sheet of standard letter paper. It even 
has a hinge so you can scan from a book. 

The Moment of Truth 

I hooked up the scanner to the computer. I'm not 
sure what I expected to happen, but nothing did. 

I realized I needed something actually to use the 
scanner. It turns out that the word scanner means 
security scanner a lot more often than image scanner. 
This made locating software a little more difficult. 
I found out that I already had packages for sane 
and xsane installed, xsane is the graphical front 
end to the 5ANE library, and it gives you a GUI 
to control your scanner. Because it was already 
installed, I started with xsane. Right off the bat, 

I hit a problem. By default, xsane connected to 
/dev/videoO, which is my Webcam. 



Figure 1. xsane 

After a little more research, I ended up at 
Avasys. It provides drivers and a software utility for 
talking to my new scanner. I had to download a 
64-bit deb for iscap and esci-interpreter. I clicked on 
Image $can! for Linux, and the program told me it 
wasn't able to talk to the scanner. I was beginning 
to have flashbacks to the last time I tried using 
a scanner. Remaining calm, I power-cycled the 
scanner and tried the application again. This time, 
it started up without complaint. I was able to click 
5can, and a scan of a book cover showed up. 

I was able to scan to several different file types: TIFF, 
JPEG, PNM, PNG and PDF. I was a little disappointed 
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that the PDF option didn't allow me to store more 
than a single page in the file. 



Figure 2. Image Scan! 


As a bonus, the driver package I installed fixed 
the problem with xsane. Now when it starts, it gives 
me the option to choose my Webcam (weird) or my 


scanner. This also solves the problem of not being 
able to scan multiple pages into a single PDF, 
because xsane has that feature. The key is to 
change xsane into multipage mode before I start 
scanning. This allowed me to scan several pages 
and save them as a single PDF. 

Another Option 

I am running Karmic (9.10), but by the time this 
article is printed. Lucid (10.4) will be released. 
Simple Scan is being included as part of the release. 
Simple Scan is a new scanning tool focused on making 
scanning, in a word, simple. Because it currently 
is available as part of a PPA (Personal Package 
Archive), adding it to my system was easy: 

sudo add-apt-repository ppa:robert-ancell/simple-scan 

sudo apt-get update 

sudo apt-get install simple-scan 

The first time I tried Simple Scan, it failed. That 
was a bit frustrating, but it was my own fault. I had 
xsane open at the same time. It turns out that each 
application claims ownership over the device. Once I 
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closed xsane, Simple Scan worked like a champ. It 
really is simple, and it can do photos or text. Plus, it 
made making a multipage document as easy as just 
continuing to scan, xsane gives you incredibly fine¬ 
grained control. There probably are situations where 
I would be glad it includes a histogram of the image, 
but when all I am trying to do is sign a contract, 
scan it and e-mail a PDF, Simple Scan fits the bill. 



Figure 3. Simple Scan 

One word of warning before I move on. After 
spending some time using Simple Scan, I hit a problem. 
Simple Scan depends on the convert command from 
ImageMagick to make a multipage PDF from a series 
of scans. On Karmic, this results in a segmentation 
fault. I was able to locate a bug report confirming 
that this problem has been resolved for Lucid, but 


Updates for Previous Columns 

Updates on Qimo: 

My instructions for getting Qimo to use more modern packages 
were not detailed enough [see "A Desktop for Our Little Penguin" 
in the February 2010 issue]. Unfortunately, the original computer I 
built died before I could get the files off the drive. The good news 
is that Qimo 2.0 should be out by the time you read this. That version 
will be in sync with Lucid (10.4) and save you a lot of hassle. 

Updates on APT Caching: 

Eric Cooper, the author of Approx, contacted me about my comments 
that his software did not handle multiple computers at once [see 
"Installation Toolkit" in the March 2010 issue]. Fie pointed out 
that the criticism might have been from an earlier version. The 
current version of Approx uses inetd/xinetd, so they do not suffer 
from that limitation. That means you have several good choices for 
caching packages on your network! Sorry for the mistake Eric. That's 
what I get for reading a blog and being in a hurry. 


not for Karmic. There are three options: upgrade to 
Lucid, pull in only the Lucid packages, or pull in an 
updated ImageMagick from Raimar Sandner's PPA 
(see Resources). Long term, I plan to upgrade to 
Lucid. For now, I just used Sandner's PPA to pull in a 
fixed version. I was able to confirm that this works. 

Success Breeds Success? 

After having such a pleasant experience getting my 
scanner working, I realized I was up for more of a 
challenge. At my office, I have an HP LaserJet 3055. 
It's one of those multifunction copier, scanner, 
fax and printer machines. The v300 scanner I 
used at home was connected directly to my Linux 
box. In the case of the HP, I have to connect over 
the network. Will it be as simple to set up as the 
single-function Epson? 

At first, I wasn't sure where to start. There didn't 
seem to be any tools to detect the scanner on the 
network. Then I realized I had already been through 
this process. Step one with the scanner is finding 
out whether special drivers exist. In this case, the 
key phrase is "HP Linux Imaging and Printing" 
system, or HPLIP for short: 

sudo apt-get instalt hplip-gui 

Once that is done, I needed to run hp-toolbox 
to configure the network printer. This provides a 
handy icon in the GNOME alert bar that lets you 
select actions for the multifunction printer. By 
choosing Scan, it automatically starts xsane. I found 
the configuration under Preferences in the HP system 
tray application to change it to Simple Scan as the 
default application. For some reason, it refused to 
find the HP scanner when I launched it from the 
system tray. At the moment. I'll chalk that up to 
something that will get ironed out, as a bug report 
says the switch will be official in Lucid (see Resources). 
On the plus side, now that the HP is completely 
installed, I can start xsane or Simple Scan and then 
choose the HP for scanning. Scanning under Linux 
really has come a long way. 

The Final Challenge 

I now have both scanners working, so I decided to 
dig a littler deeper into scanner configuration. My 
Epson scanner has three hardware buttons: PDF, 
Email and Copy. It would be cool if I could press the 
PDF button and have it automatically start Simple 
Scan and start scanning. To do that, I needed to 
use a tool called scanbuttond. This is a daemon that 
runs in the background and monitors the status of 
the scanner buttons. It then allows you to trigger 
scripts based on the button pushed: 

sudo apt-get instalt scanbuttond 
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started the daemon by running: 


Resources 


scanbuttond -f 

The application still logs to /var/log/syslog, but the -f option 
keeps it in the foreground, so it is easier to kill as I work 
through the configuration issues. Right away, things got 
off to a bad start. My log file quickly filled with: 


SANE: www.sane-project.org 

SANE—Supported Devices: 

www.sane-project.org/sane-supported-devices.html 
Avasys: www.avasys.jp/lx-bin2/linux_e/scan/DL1.do 


Conclusions 

It was a little disappointing that I couldn't 
get the buttons on the scanner to 
work, but that ended up being the only 
roadblock in the whole process. The 
main lesson is to get the driver for 
your scanner, then worry about the 
rest. Scanning under Linux has 
improved a lot since I last played 
with it, and I'm really excited to take 
something off the list of things I have 
to do on another operating system.* 


Dirk Elmendorf is cofounder of Rackspace. some-time 
home-brewer, longtime Linux advocate and even 
longer-time programmer. 
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Bug Report: ImageMagick crashes when using adjoin to make a 
multipage pdf (karmic): https://bugs.launchpad.net/ubuntu/ 
+source/imagemagick/+bug/551484 

Raimar Sandner's Ubuntu Karmic ImageMagick: convert jpg to pdf 
segmentation fault: homepage.uibk.ac.at/~c705283/archives/ 

2010/03/19/ubuntu_karmic_imagemagick_convertjpg_to_pdf_s 
egmentation_fault/index.html 

Bug Report: scan utility should now be simple-scan: 

https://bugs.launchpad.net/ubuntu/+source/hplip/+bug/539015 


Now I know why they provide a -q option to quiet the log 
messages. The problem seems to be that the system doesn't 
know how to detect my scanner. After spending some time on 
Google, I determined that the way to solve the identification 
problem is to modify the source of scanbuttond to detect the 
scanner. Using Isusb, I was able to code to add to scanbuttond. 

I recompiled the package and got a notification that it found 

my scanner. _ 

After all of that, I learned that even 
more work was needed, scanbuttond 
uses libusb to communicate with the 
scanner. This allows it to talk to the 
scanner without locking it up (the 
way xsane and Simple Scan do). As 
a result, in order to get the button 
presses, you have to know precise 
codes to send to the scanner to get a 
response. Once I realized that, I was 
able to confirm that the Epson scanner 
I have acts completely differently from 
the other scanners that scanbuttond 
knows about. If I knew more about 
USB debugging, I might have had a 
shot at fixing the problem. 


got cloud? 


Well, why not? Streamline your business 
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training sessions with our cloud-certified 
experts to get you up to speed. 
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The Linux Box's Enkive 

Former President Nixon would have balked at Enkive, a new open-source e-mail archiving and retrieval application 
from The Linux Box. That's because Enkive captures e-mail messages as they arrive or are sent to ensure they are 
retained before a worker can delete them in an e-mail client. This feature helps organizations address the issues 
of compliance with laws and regulations governing communications, as well as litigation support. It permits 
recovery of e-mail in full support of an organization's retention policies. In addition, storage costs are 
reduced by eliminating the capture of redundant messages and attachments. 
www.linuxbox.com 


RackForce's ddsCloud Enterprise 

The team at RackForce has announced availability of ddsCloud Enterprise, an enterprise-level hosted private cloud 
solution. RackForce describes ddsCloud Enterprise as a fully virtualized network, storage and compute capacity 
in an on-demand model that utilizes best-in-class technologies from Cisco, IBM, Microsoft and VMware. Built on 
RackForce's new state-of-the-art GigaCenter infrastructure, the firm says the results are "unprecedented scalability, 
flexibility and greenness". ddsCloud Enterprise leverages virtualization and unified fabric to combine computing, 
network and storage into one seamless system. When compared with previous computing models, RackForce asserts 
that it has seen deployment times reduced by 85%, customer costs by up to 30% and a carbon footprint merely 
1 /50th the size of other cloud offerings located in conventional North American data centers. 
www.rackforce.com 



Lucene in Action , 2nd Ed. (Manning) 

The editorial duo of Erik Flatcher and Otis Gospodnetic has updated the book Lucene in Action 
from Manning Publications to a new 2nd edition. The 500-pager is touted as the definitive 
guide to Lucene, an open-source, highly scalable, super-fast search engine that developers can 
conveniently integrate into applications. Since the first edition, Lucene has grown from a nice- 
to-have feature into an indispensable part of most enterprise apps. The book explores how to 
index documents; introduces searching, sorting and filtering; and covers the numerous changes 
to Lucene since the first edition. All source code has been updated to current Lucene 2.3 APIs. 
www.manning.com 



Howard Davies and Beatrice Bressan's A 
History of International Research 
Networking (Wiley) 

Publisher Wiley calls A History of International Research Networking "the first book written and 
edited by the people who developed the Internet", and it covers the history of creating universal 
protocols and a global data transfer network. Editors Floward Davies and Beatrice Bressan, 
two veterans of the CERN particle physics research lab, are two of many insiders who 
contribute with perspectives never before published on the historic, technical development 
of today's indispensable Internet. 
www.wiley.com 
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cPacket's cVu320G Network Appliance 

The company cPacket is now marketing the cVu320G network appliance, a solution for data centers, service 
providers and telecommunications that enables on-demand capacity management, resource allocation and 
real-time troubleshooting of bursts and spikes. The cVu320G provides complete packet inspection filtering, flexible 
traffic aggregation, selective duplication and flow-based load balancing, as well as granular, wire-speed performance 
monitoring for 32 10-Gigabit links. cPacket's rationale for the application is threefold: first, today's data centers 
struggle with the growing stampede to 10 Gigabit and the increasing virtualization of platforms and services; 
second, monitoring tools have not kept pace with these developments, and, as a consequence, data centers are 
being overwhelmed with huge volumes of complex traffic, which they no longer have the visibility to control; and 
third, the consequences include intermittent and frequent congestion, performance degradation and major service 
disruptions to end users that are becoming increasingly common. The solution is based on cPacket's unique, 
20-Gigabit "complete packet inspection" chips and Marvell's 10-Gigabit Prestera switch. 
www.cpacket.com 

Napatech's NT20E2 Network Adapters 

With the introduction of the NT20E2 Capture adapter and NT20E2 In-line adapter 
products, Napatech recently unveiled what it calls "the world's first 2x10 Gbps Intelligent 
Real-time Network Analysis adapters". Napatech has positioned the NT20E2 In-line 
for applications that require both capture and transmit in real time, such as intrusion 
prevention systems and policy enforcement applications operating at lOGbps line-speed. 

The former is complemented by the NT20E2 Capture adapter, which provides full 
20Gbps packet capture throughput over the PCI-Express Gen 2 bus. The NT20E2 is drop-in-compatible with existing NT20E cards 
and is supported by the same driver software as other Napatech network adapters on Linux, FreeBSD and Windows. 
www.napatech.com 
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Synaptics Gesture Suite 

A big welcome to the Linux family is in order for Synaptics, whose Gesture Suite Linux 
(SGS-L) for its TouchPads is now available on a number of Linux variants. The solution 
allows OEMs that offer Linux-based solutions to provide their users "a powerful and 
intuitive way to be more productive and interactive with their Linux-based notebook 
systems". SGS-L supports a wide range of pointing enhancements and gestures, including 
two-finger scrolling, PinchZoom, TwistRotate, PivotRotate, three-finger flick, three-finger 
press. Momentum and ChiralScrolling. It is provided free of charge to Synaptics OEM/ODM 
partners when ordered with Synaptics TouchPad and ClickPad products. 
www.synaptics.com/go/SGSL 


Numerical Algorithms Group's NAG 
Library for SMP and Multicore 

Application developers seeking two essential things—better use of the processing 
power of multicore computer systems and an easy way to migrate existing applications 
to multiprocessor architectures—can go and get the Numerical Algorithms Group's 
NAG Library for SMP and Multicore. The company points out how mathematical and 
statistical algorithms optimized for performance on multicore architectures have 
become key to progress in various aspects of technical application development and 
computationally intensive problem solving. The library contains more than 1,600 
routines, including more than 100 new ones for this release. 
www.nag.com 



Please send information about releases of Linux-related products to newproducts@linuxjournal.com or New Products 
c/o Linux Journal, PO Box 980985, Houston, TX 77098. Submissions are edited for length and content. 
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Fresh from the Labs 


marave—Stylish Text Editing 

code.google.com/p/marave 

If you're the kind of person who's 
been using Blackbox and its derivatives 
for the past decade, the kind of 
person who has just a single CD in 
a spotless but stylish car, the kind of 
person who likes minimalism but with 
effortless style, then boy, have I got a 
project for you. To quote the marave 
Web site: 

Inspired by ommwriter and 
other similar projects, marave 
(it means "nothing" or "it 
doesn't matter" in guarani) 
aims to be a simple, clean text 
editor that doesn't distract you 
from your writing. 

You can have a nice back¬ 
ground, or just a color. You can 
have a real-time spellchecker or 
not. Syntax highlighting or not. 

You can have background music, 
keyboard feedback or silence, 
marave will try to be the way 
you want it to be. 

Installation Project maintainer 
Roberto Alsina is hoping to integrate 
marave into most distros soon, but for 
now, the only packages available are 
for Arch Linux and Fedora. If you use 
another distro, your only option is 
the source, but that's okay, because 
installing the source is pretty easy. 

In terms of requirements, the 
documentation says you need the 
following libraries: 

■ GNU source-highlight 

(www.gnu.org/software/ 

src-highlite). 

■ Source-highlight-qt 

(srchiliteqt.sourceforge.net). 

■ SIP, which should come with PyQt. 

■ A C++ compiler. 

Assuming you're going with the 
source, head to the Web site, grab the 
latest tarball, extract it, and open a 
terminal in the new folder. 




marave can use different themes (or none at all), as well as play Internet radio. Here. I’m using it 
to edit this month's article. 


If your distro uses sudo, enter: 

$ sudo python setup.py instatt 

If your distro doesn't use sudo, enter: 
$ su 

(enter your password) 

# python setup.py install 

Once that's done, run marave with 
the command: 

$ marave-editor 


Usage The first thing you'll notice 
when you're inside is that the entire 
desktop has disappeared and you are 
in a single full-screen program. This is 
unabashed full-screen editing, designed 
to immerse you and cut out distractions. 
As if to reinforce this ethic, the few 
existing GUI elements on the side disap¬ 
pear until you move your mouse again, 
leaving you with only your text, a 
blinking cursor and a scroll bar. 

But, enough of the straight mini¬ 
malism. What really impresses me is 
the look of the thing. It's a sleek and 
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undeniably gorgeous environment in 
which to work. An amusing touch for 
those who like a bit of flair (and have 
possibly been watching too many 
Hollywood movies) is a click noise 
for whenever you press the keyboard, 
adding a bit of romance to the other¬ 
wise dreary world of typing. 

As for some of the other features, 
it's time we explore those GUI elements 
that usually are hidden away to the 
right of the scroll bar. The first button at 
the top allows you to change the font, 
as well as the color. The magnifying 
button has a submenu to zoom your 
text in and out, which is actually one 
of my favorite features of this program. 
The blank sheet of paper button has 
a menu with all the usual functions of 
loading, saving and so on. 

Further down is an icon that looks 
like a camera. The left and right buttons 
switch between desktop backgrounds, 
including various snowy nature themes 
and what appears to be a Debian back¬ 
ground. The color wheel at the right 
also allows you to adjust the background 
color and get rid of the background 
picture completely, if you so desire. The 
next button gives you amusing control 
over what sort of keyboard click noise 
you'd like (or whether to disable it). 

Next up, there's a music button that lets 
you play what I think is streaming music 
(as well as turn it off). I'd go into this 
more, but space and documentation 
are kind of lacking. 

Second from the bottom is a button 
that looks kind of like a cricket bat, 
which appears to bump the text 
around, but I'm not sure I can elaborate 
much further on it. I found there are a 
number of GUI customization options 
to move around all of your working 
elements, such as the text area size 
and placement, but I also ran into 
some confusion (I deleted the config 
file to reset in the end, after getting 
myself into some Ul trouble). And as 
I already mentioned, documentation 
still is lacking. 

Something that really impressed 
me was marave's handling of foreign 
characters. A file of mine that had both 
Japanese and Greek characters mixed 
in with the Latin alphabet displayed 
without the slightest hiccup. 

I can't help but feel that with a bit 
of modifying, marave also would make 
a brilliant ebook reader if it could 


handle files such as PDFs. Perhaps if in 
the short term, someone tacked on 
some code that would use a PDF-to-text 
converter, such as pdf2ascii, and then 
just piped the output to screen? An 
environment as cool as this one, with a 
full-screen interface, no intruding GUI 
elements and zooming text, easily would 
dissuade me from getting a commercial 
ebook device in favor of simply using 
marave on a basic Netbook. 

What ultimately draws me to the 
project is that it doesn't just have mini¬ 
malism and simplicity, it has minimalism 
and simplicity combined with beauty 
and a palpable design ethic, marave has 
soul, and I love that. 

Storybook—Novel Writing 
Organizer 

storybook.intertec.ch 

Before I begin, I have to offer a mea 
culpa. I wrote about this several months 
ago in the Projects at a Glance section, 
promising to cover it the following 
month, but it got lost in the noise, and I 
remembered it when reading over the 
section in LJ months later. Apologies 
to any Storybook fans and developers! 
Anyway, on with the show. 

To quote the project's Web site: 

Storybook is a free (open source) 
novel-writing tool for creative 
writers, novelists and authors 
that will help you keep an 
overview of multiple plot-lines 
while writing books, novels 
or other written works. 

Storybook assists you in structuring 
your book. Store all information 
about your characters and 
locations in one place. Then, 
use the included Storybook 
features for managing chapters, 
scenes, characters and locations. 

A simple interface is provided 
to enable you to assign your 
defined characters and locations 
to each scene and to keep an 
overview of your work with 
user-friendly chart tools. 

Installation As far as requirements 
go, the only one that jumped out was 
Java 6 (Storybook won't work with 
Java 5 or earlier), which shouldn't be 
much hassle. 

Available from the Web site is what 
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Given that characters typically are the most 
important part of a story. Storybook's character 
options are wonderfully thought out. Note the 
color assignment option—a brilliant touch for 
creative types. 

appears to be a distro-neutral tarball. 
Download and extract the file, and 
open a terminal in the new folder. 
Once there, enter 


Usage Although I can't really give 
you a whole rundown on how to use 
Storybook (that would require an article 
all its own), I at least can introduce you 
to the main elements and highlight the 
coolest parts of the program. Thankfully, 
the Web site has a very gi 
have a gander if you wan 


things further. 

When you first 
enter the program, 
it prompts you 
for a project title. 
Once you're past 
that, you'll be in 
the main screen 
where you can 
start exploring. 

The largest part 
of the window 
is called the 
Chronological 
View, which 
shows your scenes 
in chronological 
order (as this 
project is still 
new, this window 
>r the meantime). 

To the right, in the top section, is 
the Object Tree. This shows all the 
objects involved in this story thus far 
(such as characters, scenes, locations 
and so on) in a hierarchical order. In 
the bottom section is the Quick Info 
y much does what it 
o for each object 
you're looking at in the upper pane 
(the Object Tree). 

As for actually taking your first steps 
in Storybook, you need to begin with 
new characters, scenes and so on. The 
toolbar at the top with the icons will be 
your best friend here. The first icon is 
Open (ignore for now), but continuing 
right are New Scene, 

New Chapter, New 
Character and New 
Location. Each of 
those has very well 
thought-out dialog 
screens that link to 
other sections of 
the program. 

For instance, the 
New Scene dialog 
allows you to link 
individual or multiple 
characters to a scene, 
as well as individual 
or multiple locations. 

The New Location 
dialog lets you be 
very detailed, giving 
you the chance to 
assign a name, 
address, city and 
country to this 


location, as well as a large description 
box to flesh out as many details about 
this place as possible. 

However, it's the New Character 
dialog screen that's particularly well 
thought out. Each character can be 
assigned everything from first and last 
names, abbreviations, gender, birthdays, 
date of death and occupation. They 
even can be assigned a color. But, the 
most important feature is defining 
whether they are a Central Character 
or a Minor Character, which then 
affects the rest of the information 
throughout Storybook—masterful. 

But what is the point of all this 
categorization, you may ask? Well, 
it allows you to see patterns in your 
story and give it structure much earlier 
in the process than a bare-bones, 
traditional pen-and-paper approach 
would allow. Are you overusing a 
character? Have you broken a piece 
of continuity somewhere or perhaps 
lost or missed out on some vital context 
to the story? Storybook likely will show 
it long before you see it yourself.* 

John Knight is a 25-year-old. drumming- and climbing- 
obsessed maniac from the world's most isolated city—Perth. 
Western Australia. He can usually be found either buried in an 
Audacity screen or thrashing a kick-drum beyond recognition. 


Brewing something fresh, innovative 
or mind-bending? Send e-mail to 
newprojects@linuxjournal.com. 
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Coyote Point Offers 
Application Balancing 
for Virtual Servers 


Stressed budgets and the high demand for Web applications have made efficiency in 
the data center priority number one, creating the need for inexpensive load balancing, 
application balancing and SSL-acceleration products. Does Coyote Point have what it 
takes to succeed in the enterprise? frank j. ohlhorst 


Thanks to the low cost of open-source solutions and the 
falling prices of hardware, network managers are finding it 
easier than ever to build out networks and grow the capabilities 
of the data center. Network managers quickly can meet the 
increasing demands of Web services, remote access, VPNs and 
many other services by provisioning new inexpensive servers 
with open-source software, which has fueled the exponential 
growth of server clusters and Web applications. However, 
many network managers are discovering that just throwing 
additional servers into the mix is an inefficient way to balance 
loads across server farms. High server demand can create a 
cascade effect across the servers, with a services request sent 
to the primary server and moving on to the next server only 
after the primary server has become saturated. 

What's more, server virtualization solutions magnify the 
problem. Virtualization makes it even easier and quicker to build 
out multiserver solutions, yet virtualized servers still rely on that 
same round-robin approach to meet high demand. That proves 
to be very inefficient and a waste of processor cycles, and it 
negates most energy savings offered by virtualization. 

Many data-center administrators have turned to third-party 
products to help mitigate those load-related inefficiencies, creating 
a healthy market of load-balancing and traffic-acceleration 
solutions. Numerous products, ranging from open-source software 
to hardware appliances that cost tens of thousands of dollars, 
are all fighting for market share and promise to be the best 
way to manage loads across servers and sites. However, traffic¬ 
balancing products are not created equally and selecting the 
correct load balancer can be fraught with uncertainty. 

Coyote Point Systems entered the fray more than 15 years 
ago, with the ideology that network traffic management was 
the key to maximizing bandwidth and services availability to 
endpoints. Contrary to what other vendors were doing in the 
1990s, Coyote Point chose to go the route of building traffic 
management capabilities into an appliance. The goal was to 
replace bulky software solutions with an easy-to-manage device. 

Although Coyote Point was a pioneer in the world of 
traffic-shaping and load-balancing appliances, other companies, 
such as F5 Networks, Barracuda Networks and Cisco also 
have focused on providing hardware-based load-balancing 


Connectivity (lOOObase-T) (lOOObase-T) (lOOObase-T) 
600 Mbps (L4) 850 Mbps 1.0 Gbps 


Envoy (GLSB) 
Global Server 
Load Balancing 


Hardware 

Throughput 


Included 

Unlimited 


Figure 1. Feature Chart 
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Figure 2. Coyote Point E650GX 


solutions, creating a crowded field 
of contenders, where each vendor is 
looking to tout specialized capabilities 
to become the appliance of choice. 

Coyote Point has chosen to up the 
ante with the launch of a new series of 
load-balancing appliances, which are 
virtual server-aware. What's more, the 
company is looking to shift the focus 
from layer 4 load balancing to application 
load balancing, where the appliance 
is aware of payload as well as raw 
traffic. Coyote Point's application load¬ 
balancing appliances can shape traffic 
and efficiently distribute loads across 
multiple servers, even if those servers 


are virtual in nature. The growth of 
virtualization solutions in the data center 
has made it critical for traffic-shaping 
and load-balancing appliances to integrate 
with virtual server solutions. 

Coyote Point offers four different 
appliances. Those four appliances differ 
based upon design traffic load and sub¬ 
features, yet all share the same manage¬ 
ment console and basic feature set. 

Application Load Balancing— 
the Coyote Point Way 

I tested the E650GX (V8.6) load-balancing 
appliance for ease of use, feature set, 
performance and suitability to task. I 


found that the device is very simple to 
install; the physical portion of the instal¬ 
lation consists of plugging in the device 
and routing the appropriate Ethernet 
cables to the unit. The E650GX is 
Coyote Point's top-of-the-line appliance 
and sports 22 Gigabit Ethernet interfaces 
for connecting server clusters. 

I spent more time figuring out 
my cabling than I did configuring the 
device. Making sure your cabling goes 
to the appropriate servers is one of the 
most important steps for deploying a 
Coyote Point appliance. You have to be 
certain that you are plugging your server 
farm in to a load-balancing port on the 
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device. In complex environments, it is 
easy to forget that a particular network 
segment is plugged in to a different 
router or switch from what you originally 
thought. However, on smaller networks, 
you simply can plug in the connection 
from your firewall to the external port 
on the E650GX and then plug each 
segment of the LAN in to the internal 
ports on the device. All ports on the 
E650GX are Gigabit Ethernet and 
support full duplex operation. That means 
it is very unlikely the device will introduce 
any bottlenecks into the LAN or WAN 
connections, and none were detected 
during performance testing. 

I found the rest of the setup process 


plug-and-play easy. After I connected 
all the cables, I was able to access the 
E650GX's management console using a 
Web browser. The management console 
is based on AJAX technologies, creating 
a rich user interface, which is easy to 
navigate and sports context-sensitive help. 
The interface was designed using the Dojo 
toolkit, a JavaScript development toolset, 
which helps give the management 
interface a professional look and feel. 

The E650GX works pretty well right 
out of the box. All you need to do to start 


load balancing is to set up some basic 
parameters. One of the first steps to 
complete is the definition of your server 
clusters. For example, if you have nine 
servers running a Web application, you 
would plug each of those servers into an 
internal port on the E650GX appliance. 
The next step consists of defining your 
server clusters. For example, you may 
want to divide those nine servers into 
three clusters. That proves very easy to do 
and easy to modify, if you need to change 
anything. All clusters are defined logically, 
allowing a great deal of flexibility. 

I found that the E650GX offers many 
options when it comes to load balancing. 
You can use "Match Rules and Custom 


Load Balancing Policies" to build policies 
based upon layer 4 requests, layer 7 
requests or even create custom policies 
using Boolean logic. The layer 4 policies 
offer basic load-balancing capabilities, 
based on parameters, such as least 
connections, fastest response, adaptive 
and round-robin, as well as an agent- 
based algorithm that is accurate if the 
agent is run on each server. Layer 7 
policies actually look at the content 
of the traffic to determine how to load 
balance it. For example, certain protocols 


or applications can be used to trigger 
a load-balancing policy to route traffic 
to a particular cluster. Policies based 
on Boolean logic take into account 
particular requests, based on a series 
of administrator-defined events. Those 
policies can be used to reroute traffic if 
a server fails to respond (failover routing) 
or to route based on a schedule. 

Once you have defined your clusters, 
you then can define rules to handle 
traffic flow and load-balancing decisions. 
Coyote Point calls those definitions Smart 
Events. The rules are based on a number 
of parameters, such as server load, traffic 
type, server weighting and connection 
persistence. The underlying technology 
that allows the E650GX to make traffic¬ 
routing decisions is very complex. 
However, the E650GX does an excellent 
job of hiding that complexity by using 
rule-creation wizards and a commonsense 
procedural layout to make rule definition 
very easy. That ease of configuration 
is rarely found in software-only load¬ 
balancing solutions and allows even 
newbie network administrators to set up 
basic load balancing with the E650GX. 

I found that one of the most impres¬ 
sive features of the E650GX was the 
unit's ability to work with VMware's 
vSphere products. That brings application 
load balancing and traffic shaping to the 
world of virtual servers. Coyote Point has 
built support for VMware's APIs, allowing 
the E650GX to judge the load on a virtual 
server, then route requests bases on virtual 
loads and administrator-defined load¬ 
balancing policies. What's more, Coyote 
Point has included support for IPMI- 
capable servers. The Intelligent Platform 
Management Interface (IPMI) is a specifi¬ 
cation that allows third-party products to 
power on and power off servers, as well 
as remotely execute other commands. 
Simply put, you can define a policy that 
automatically turns on a server when traf¬ 
fic loads hit a certain level, and then shut 
off that server once traffic load drops. 

I also found it very easy to segment 
LANs using the product's VLAN capabili¬ 
ties. Administrators define VLANs based 
on IP address segments, and the unit's 
built-in routing capabilities keep traffic 
isolated on a VLAN for local requests. 
That can help reduce latency and speed 
up requests by keeping the appropriate 
traffic on the same logical segment. 

Ease of use permeates the interface, 
making it simple to set up many, if not 


Performance was tested using a Spirent Avalanche 2900 configured to gen¬ 
erate traffic load on the E650GX. The goal of the performance testing was 
to rate layer 7 HTTP transactions per second, layer 7 HTTPS throughput, 
layer 4 maximum concurrent connections and layer 4 connections per sec¬ 
ond. Those metrics are a good indicator of the processing power of the 
device, as well as the overall capacity of the device. 

■ Layer 4 connections per second: 160,000 CPS 

■ Layer 4 maximum concurrent connections: 17.5 million 

■ Layer 7 HTTP transactions per second: 110,000 TPS 

■ Layer 7 HTTP throughput: 1,300 Mbps 

■ Layer 7 HTTPS transactions per second: 14,000 TPS 

■ Layer 7 HTTPS throughput: 825 Mbps 
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Coyote Point’s application load-balancing 
appliances can shape traffic and efficiently 
distribute loads across multiple servers, 
even if those servers are virtual in nature. 




all of the unit's capabilities. You also 
will find that ease of use present in 
the device's dashboards and reporting 
menus. The dashboards offer a quick 
snapshot of how the device is performing 
and what traffic is flowing across the 
device. Reports offer a historical reference 
of many monitored parameters and can 
be useful for fine-tuning the unit. 

Although the E650GX's primary 
focus is on application load balancing, 
the unit also includes other features 
that help speed up network access 
and reduce latency. Those features include 
SSL acceleration, HTTP compression and 
global/geographic load balancing. SSL 
acceleration helps reduce the latency 
found in HTTPS requests by offloading 
the packet encryption on to the device. 
HTTP compression helps reduce latency 
by compressing and optimizing HTTP 
requests, while global/geographic 
load balancing can be used to balance 
traffic across geographical clusters, 
placing requests on servers that are 
closest to the user, as far as latency 
and bandwidth are concerned. 


Administrators supporting e-commerce 
solutions will appreciate the E650GX's 
ability to deliver persistent connections. 
E-commerce transactions rely on a reliable 
connection between the client PC and 
the server providing the transaction—if 
either endpoint loses track of each other 
or is routed incorrectly, the e-commerce 
transaction will fail. The E650GX solves 
that problem by creating a persistent 
connection between the client PC and 
the server using cookies, which are 
inserted into the HTTP returned to the 
client. That ensures the client will return 
to the same server in the cluster. 

The E650GX supports an 
active/passive failover model for sites 
that need guaranteed uptime. Failover 
works by transferring the Equalizer 
configuration to a backup device 
(which can be a lower-end model in the 
Coyote Point family), so that persistent 
client/server connections are maintained 
even when the primary unit fails. 

Coyote Point has re-invented the idea 
of load balancing by shifting traffic shap¬ 
ing from basic layer 4 algorithms to layer 


7, application-aware calculations. That 
approach has created a new market 
segment called application traffic shaping. 
Coyote Point also bundles in other 
advanced capabilities, ranging from 
SSL acceleration to VLAN definition to 
VMware vSphere support, making the 
device a complete traffic-acceleration 
solution. Coyote Point is very adept at 
providing an acceleration solution for 
most any server environment that can 
benefit from clustering and traffic man¬ 
agement. The top-of-the-line E650GX has 
an MSRP of $14,395 and comes with one 
year of support included. Although $15K 
may seem like a big chunk of change, 
Coyote Point's price is less than half of 
what some larger competitors charge.* 


Frank J. Ohlhorst is a freelance technology journalist, profes¬ 
sional speaker and technology business consultant who covers 
several topics for many major publications. Frank currently 
writes Virtualization and Application development articles for 
Tech Target. Operating Systems and Security articles for 
Computer World. Technology How-Tos for Ars Technics and 
Channel-focused product reviews for ChannelTechCenter.com. 
Frank also has been the Executive Technology Editor for eWeek 
and the Lab Director of CRN’s Test Center. 
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Got lots of systems? 
Make them into a 
cloud computing cluster 
with Eucalyptus! 


Build 
Your Own 
CLOUD 




EUCALYPTUS 



BILL CHILDERS 



In the March 2010 issue, I wrote an article 
on how you could deploy Ubuntu 9.10 as part 
of Amazon's EC2 cloud computing service. 
Amazon's EC2 service can be useful, but what if 
you have a bunch of machines already and don't 
want your data outside your network? Or, what 
if you don't want to pay the ten-cents-per-hour 
fee that Amazon charges? That's where the 
Ubuntu Enterprise Cloud comes in. The Ubuntu 
Enterprise Cloud system ships with the Server 
Edition of Ubuntu 9.10, and it's based on the 
Eucalyptus cloud cluster software. 

What exactly is Eucalyptus? Put simply. 
Eucalyptus is an open-source, Amazon EC2- 
compatible, cloud computing cluster package 
that can be run on commodity Linux machines. 
Although VMware and VirtualBox do similar 
things. Eucalyptus allows you to scale your cluster 
across multiple machines. When you run out of 
resources to run another VM, you simply can pop 
a new Eucalyptus server on your network, and 
you're off and running. 
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Installing Your Ubuntu Enterprise Cloud 

Now that you're all fired up, let's get started with the 
Ubuntu Enterprise Cloud (UEC). The easiest way to get 
started with UEC is to do a fresh installation of Ubuntu 
9.10 Server with the UEC option. You need two systems at 
a minimum to build your first cloud. One will be the cloud 
controller (the master node that dispatches and monitors 
the instances of the virtual machines), and the other will 
be the node controller (where all the instances actually will 
run). Minimum configurations are listed in the installation 
documentation (see Resources), but I recommend a dual-core, 
2GHz machine with 2GB of RAM and a 100GB disk as a 
realistic usable minimum for each. Note that you will need 
Virtualization Extensions (VT) enabled on the node controller 
machine. Eucalyptus requires that. Your systems can be 
either 32-bit or 64-bit (mine are both 64-bit), but be 
advised that although the 64-bit host can run a 32-bit 
instance, the opposite is not true. 

First, let's install the cloud controller. To start the install, 
boot your machine off an Ubuntu 9.10 Server CD, select 
Install Ubuntu Enterprise Cloud at the boot menu, and 
then press Enter. That starts the standard text-based install, 
with a twist: along the way, you'll be asked what type 
of cloud installation mode you want—a "Cluster" or a 
"Node". Because you're installing the cloud controller first, 
select Cluster and press Enter. The installer will proceed 
normally, but it will ask you two more questions unique to 
the Ubuntu Enterprise Cloud installation: the name of your 
cluster (this is just a unique identifier like "testcluster") 
and a range of IP addresses on your LAN that the cloud 
controller can allocate to instances. Once you've done 
that, the installer will finish out much like a regular Ubuntu 
text-based install, and your machine will reboot. That's it! 
Your cloud controller is now on-line. 

Next, you need to install a node controller. This is even 
easier. Boot the computer that will become the node con¬ 
troller from the same Ubuntu 9.10 Server CD, select Install 
Ubuntu Enterprise Cloud from the boot menu, and the 
installer should detect the cluster automatically and select 
Node within the installer. Simply press Enter to confirm you 
want to install a node and confirm your system's partitioning 
scheme, and the rest of the installation is completed for 
you. The installer even copies your login account over from 
the cloud controller. 

Now that your nodes are up, you need to register the node 
controller with the cloud controller. Log in to the cloud controller, 
and run the command: 

sudo euca_conf --no-rsync --discover-nodes 

The cloud controller will auto-discover the nodes that are 
running the node controller service, and it will prompt you to 
register each by its IP address. 

Obtaining Access 

Before you can use the cloud, you've got to register yourself 
with it and obtain credentials. Fire up a Web browser (either 
on the cloud controller or on another machine on the LAN), 
and go to this URL: https://<cloud-controller-ip-address>:8443. 
You have to use a secure connection, and you'll get a 



■ j ubuntu nterprise cloud 



Figure 1. The Ubuntu Enterprise Cloud Login Page 

security certificate warning from your browser. Once you 
accept the cert warning, use the user name "admin" and 
password "admin" to log in to the page (Figure 1). Then, 
you'll be prompted to change the admin password and fill 
in your e-mail address, so the UEC can mail you information 
about your instances. 

Next, you need to get your credentials to a location where 
you can use them. I prefer to do this on the cloud controller, 
so run this script as your regular user on the cloud controller: 

mkdir -p ~/.euca 
chmod 700 ~/.euca 

sudo euca_conf --get-credentials mycreds.zip 
unzip mycreds.zip 

This drops your credentials for the UEC into the ~/.euca 
directory. The credentials can be downloaded from the UEC 
admin portal to another Ubuntu machine for use if you so 


Virtualization Extensions— 
Make Sure They’re Enabled! 

Eucalyptus requires that the CPU on the node controllers have 
Virtualization Extensions (VT) enabled. Do yourself a favor and 
go into your BIOS and check that it is enabled. Simply grepping 
for "vmx" in /proc/cpuinfo isn't enough. The BIOS support 
must be enabled as well. When preparing to write this article, I 
burned up several days in testing to learn this fact. The virtual 
machine appeared to start, then terminated immediately with 
an obscure message in the nc.log file on the node controller 
like [EUCAERROR ] tibvirt: Domain not found: 
no domain with matching name ’ i-427C0881 1 
(code=42). Simply flipping the BIOS switch that enabled 
Virtualization Extensions allowed the virtual machine to start 
properly. Verify your BIOS settings before installation! 
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FEATURE Build Your Own Cloud with Eucalyptus 


What’s the Walrus— 
CooCooCaChoo? 

Eucalyptus includes a service known as the walrus. The walrus 
service is a storage service that emulates Amazon's S3 storage. 
This article covers the default installation of Eucalyptus that 
runs the walrus service on the same system as the cloud con¬ 
troller. If you have a server with a lot of disk space, it's entirely 
possible to split the walrus service out and export hunks of disk 
space as volumes to the virtual machines. In other words, it's a 
free implementation of a virtual SAN for your virtual machines. 
Unfortunately, getting into the specifics of the walrus goes 
beyond what can be covered here. 


desire. Next, you need to add the line . -/. euca/eucarc 
to your shell's profile (-/.bashrc on an Ubuntu machine or the 
cloud controller) to source the eucarc file every time your shell 
starts. If you're on another machine aside from the cloud 
controller, you need to install the euca2ools package as well. 

Now that the prep work is done, you can verify that the 
cluster is working properly by running the euca-describe- 
availabili ty-zones verbose command: 


b11l@falcon:~$ euca-c 

AVAILABILITYZONE 

AVAILABILITYZONE 

AVAILABILITYZONE 

AVAILABILITYZONE 

AVAILABILITYZONE 

AVAILABILITYZONE 

AVAILABILITYZONE 


large 

xlarge 

xlarge 


Installing Images on Your UEC 

Although it's possible to make your own custom images to 
run on your cloud (see Resources for a link on bundling 
images), it's far easier to get one from the UEC "store" 
(Figure 2). Simply access the cloud controller at the URL 



https://<cloud-controller-ip-address>:8443/, enter your login 
and password, click the Store tab, and you'll be presented 
with the UEC Store. Just find an image you'd like to install (at 
the time of this writing, there are only three), and push the 
Install button. Your image will download and install to your 
cluster automagically. Once that's done, you'll get a How to 
Run? link under the grayed-out Install button. If you click that 
link, you'll get the exact command line that will instantiate, or 
start, your selected image. 

Starting Your Image 

Instantiating an image requires you to use the command 
line on the cloud controller (or wherever you installed your 
credentials). Before you run your first image, you've got to 
create an SSH keypair so you can log in to your instance 
as root once it's up and running. The key is stored and is 
common across all your instances, so this script needs to 
be run only once: 

if [ ! -e -/.euca/mykey.priv ]; then 
touch -/. euca/mykey.priv 
chmod 0600 -/,euca/mykey.priv 
euca-add-keypair mykey > -/.euca/mykey.priv 


Next, configure the cloud to allow port 22 access (SSH) 
inbound for all instances. The following command will allow 
SSH from any source IP: 

euca-authorize default -P tcp -p 22 -s 0.0.0.0/0 

Now, you can fire up your first image: 

bill@falcon:~$ euca-run-instances emi-DF841070 -k mykey -t cl.medium 
RESERVATION r-3409079E admin admin-default 

INSTANCE i-46780864 emi-DF841070 

0.0.0.0 0.0.0.0 pending mykey 
2009-12 -10T06:26:09.471Z 
eki-F59010E3 eri-0A2A115C 

The first time you instantiate a particular image, it'll be 
slow to start. Eucalyptus caches the image on the node 
controller, so there's a sizable amount of data that's got to 
move to the node. You can keep tabs on the status of your 
image by running: 

watch -n5 euca-describe-instances 

You'll see two IP addresses listed in the output of the 
euca-describe-instances command. One will be an IP on 
your LAN, and the other will be a private IP. Once the instance 
is listed as "running", you can ssh to it on the IP listed in the 
output. Note that it doesn't have a user account with a password 
on it, so you need to use the SSH key created earlier: 

bill@falcon:~$ euca-describe-instances 
RESERVATION r-3409079E admin default 

INSTANCE i-46780864 emi-DF841070 

192.168.1.170 172.19.1.2 
running mykey 0 cl.medium 
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Resources 


DHCP Issues 


The Eucalyptus cloud controller does run a DHCP server that will 
respond to requests from cloud instances. However, if you have a 
DHCP server on your LAN, it may be possible that your instances 
could receive a DHCP address from your other DHCP server 
rather than the cloud controller's DHCP server. You may want to 
tell your main DHCP server to ignore requests sent from the 
MAC addresses of the cloud instances. All of the cloud instances 
have MAC addresses that begin in d0:0d. On my DHCP server 
running dnsmasq, all I had to do was add a line to the 
dnsmasq.conf file that said dhcp-host=d0:0d: * .ignore. 


2009-12 -10T06:26:09.471Z 
clusterl 

ekl-F59010E3 eri-0A2A115C 

bill@falcon:~$ 

bill@falcon:~$ ssh -i ~/.euca/mykey.priv ubuntu@192.168.1.170 
The authenticity of host '192.168.1.170' can't be established. 

Are you sure you want to continue connecting (yes/no)? yes 


Ubuntu Enterprise Cloud Documentation: 

https://help.ubuntu.com/community/UEC 

Ubuntu Server 9.10 Download: 

www.ubuntu.com/getubuntu/download-server 

Eucalyptus Home Page: open.eucalyptus.com 

Installing UEC Using the Installer CD: 

https://help.ubuntu.com/community/UEC/CDInstall 

Installing UEC Using the Package-Based Install: 

https://help.ubuntu.com/community/UEC/Packagelnstall 

Bundling Your Own UEC Images: 

https://help.ubuntu.com/community/UEC/Bundlinglmages 

Using the Walrus Storage Controller: 

https://help.ubuntu.com/community/UEC/StorageController 


Linux 172 2.6.31-14-server #48-Ubuntu SMP Fri Oct 16 15:07:34 
UTC 2009 x86_64 


System information as of Thu Dec 10 06:32:03 UTC 2009 

System load: 0.0 Memory usage: 16% Processes: 70 

Usage of /: 29.6% of 1.98GB Swap usage: 0% Users logged in: 0 


ubuntu@172:~$ 

At this point, you're in your instance, and it's a fully 
functioning system. You can apt-get packages like apache 
or do further system configuration if you want. When you're 
done, you can exit your SSH session, and then terminate the 
instance by finding the instance ID from the output of the 
euca-describe-instances command (in the example above, 
it's i-46780864) and running euca-terminate-instances 
<instanceID>. Your instance will then shut down. 

This article barely scratches the surface of what's possi¬ 
ble with the Ubuntu Enterprise Cloud. Although it's less 
flexible than other virtualization technologies like VMware 
or VirtualBox, it is API-compatible with Amazon's EC2 
service, and it allows you to build networks of virtual 
machines far beyond what's possible with conventional 
virtualization solutions. If you require a scalable network 
of virtual systems that can be instantiated and terminated 
dynamically, the Ubuntu Enterprise Cloud and Eucalyptus 
are for you.* 


Bill Childers is an IT Manager in Silicon Valley, where he lives with his wife and two children. He 
enjoys Linux far too much, and he probably should get more sun from time to time. In his spare 
time, he does work with the Gilroy Garlic Festival, but he does not smell like garlic. 
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MySQL 

Replication 


MySQL has been running on Linux in the data center for many years, 
but MySQL replication still is not widely understood. Whether you’re 
setting up a multinational asynchronous solution or a local cluster, 
there is a type of replication for your application. 

MICHAEL NUGENT 


MySQL databases come in all shapes and sizes, 
but most often are deployed behind Web sites. As sites 
grow, the companies behind them often become concerned 
about uptime and want to move to a high-availability model. 
Unfortunately, by this point, options often are limited based on 
the data and database engines previously chosen. 

The first decision to make when moving to a high-availability 
model is actually whether to do it. This may seem obvious, 
but it often is assumed that high availability always is a good 
option. Opting to move in this direction, also means adds sig¬ 
nificant complexity to the system as a whole. When deploying 
a clustered solution, the number of boxes increase. Thus, the 
number of individual failures will increase, even though 
the downtime of the clustered application decreases overall. 
In addition, as availability increases, cost increases. Adding 
a second box for failover doubles the cost of the server, and 
adding failover clusters in alternative geographies can double 
(or more) the operational cost of the entire data center. In 
addition to this, moving to an NDB cluster adds additional 
hardware itself. 


Before considering the level of availability you need, consider 
the cost of downtime compared to the operational costs of 
running failover facilities. Very few facilities need to continue 
running if there is a global extinction event, and planning for 
that situation would require the budget of a large government. 
Planning for failure as a result of a multinational economic 
disaster is more reasonable, but it still requires the budget of a 
large multinational corporation. When planning for national and 
localized disasters, the cost becomes more reasonable for most 
companies to handle. Based on this end-of-the-world thought 
experiment and the needs of most users, concentration on the 
development of high availability for MySQL mostly focuses on 
the single geography cluster. 

Although you can build clusters in many ways, using 
combinations of block replication and SAN storage, the official 
MySQL solutions are replication and NDB (Network DataBase) 
clusters. Each has pros and cons, and your choice is not based 
on newer or older developments within MySQL, but on what 
is right for your application. In addition to choosing a type 
of replication, the version of MySQL is also critical. Because of 
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Before considering the level of availability you need, 
consider the cost of downtime compared to the 
operational costs of running failover facilities. 


continuing software development, at least version 5.1 is 
required for many of the features described here. If your 
current database has a lower version, strongly consider 
upgrading the database software before implementing 
these solutions. Active development on version 5.0 ended 
in December 2009, and active development on version 5.1 
will end in December 2010. 

MySQL replication establishes a master-slave or master- 
master relationship between a pair of servers. These servers 
can be chained to build a circular set of many servers, or one 
master can be used for many slaves, but the relationship itself 
exists between only two. A single MySQL server can have only 
a single master. Replication is more flexible than NDB in terms 
of what types of engines and features can be used. Although 
NDB clusters are limited to NDB tables, replication can be used 
with almost any of the standard MySQL table types, including 
MylASM and InnoDB. 

Multimaster replication examples usually are set up with 
only two servers, but they can be done with any number in 
a circular set. To set up a circular set with three servers, 
the [mysqld] section of the my.cnf configuration file should 
include the following: 

server-id=l # This must be unique per server. 

auto_increment_offset=l # Must be unique per server but less 

# than the auto_increment_increment 

# value below. 

auto_increment_increment=3 # Set to at least the maximum 

# number of servers in the circle 

The auto_increment_offset value determines the starting 
point for autojncrement columns and must be unique per 
server but less than the autojncrementjncrement value. 
The autojncrementjncrement value determines the interval 
between autojncrement values on a particular server. To 
prevent conflicts, set it to at least the maximum number 
of servers in the circle. 

Now, to determine the next value in an autojncrement 
column, the server multiplies the next value expected in counting 
order by the autojncrementjncrement value, plus the 
autoJncrement_offset value. If N is the next expected value 
in a sequence (for example, 1, 2, 3, 4, 5 and so on), the next 
value for an autojncrement column becomes: 

N x auto_increment_increment + auto_increment_offset 

In addition, add the following values: 

■ log_slave_updates: this tells the server to log updates from 
its master into its own log, so that the machine can act as 
both a master and a slave. 

■ slave_exec_mode=IDEMPOTENT: this feature is strictly 
optional. It allows the slave to skip errors. Although it can 


be useful to make sure that slave replication does not stop 
due to an error, it can be dangerous, as it can cause the 
slave to desynchronize from the master, resulting in a 
different data set on each server. Use of InnoDB tables and 
transactions and rollbacks can help limit this possibility. 

Once the my.cnf file is set up, each server needs to have 
the replication user granted access for replication and set to 
point at the master database: 

server A mysql> GRANT REPLICATION CLIENT, 

REPLICATION SLAVE, 

SELECT, FILE, PROCESS, 

SUPER RELOAD ON *.* TO 'replication'@'%s' 
identified by 1 replpass'; 
server A mysql> flush privileges; 
server A mysql> change master tb 

MASTER_H0ST="serverB.example.com", 
MASTER_USER="replication", 

MASTER_PASSW0RD='replpass'; 
server A mysql> start slave; 

server B mysql> GRANT REPLICATION CLIENT ... ; (as above) 
server B mysql> flush privileges; 
server B mysql> change master to 

MASTER_H0ST="serverC.example.com", 
MASTER_USER="replication", 

MASTER_PASSW0RD='replpass 1 ; 
server B mysql> start slave; 

server C mysql> GRANT REPLICATION CLIENT ... ; (as above) 
server C mysql> flush privileges; 
server C mysql> change master to 

MA5TER_H0ST="serverA.example.com", 
MASTER_U5ER="replication", 

MASTER_PASSW0RD='replpass'; 
server C mysql> start slave; 

In this scenario, server A gets data from server B, which 
gets data from server C, which gets data circularly from server 
A. Data can be inserted into any of the three servers and will 
be replicated to the other two. This doesn't speed up writes 
(except possibly for additional drive spindles), but it can add 
additional speed for reads if the application rotates between 
servers in the cluster. In addition, if there is hardware failure, 
the highly available nature of the cluster contains copies of 
data. Removing a dead slave can be as simple as using the 
"change master" statement to point at the grandfather 
server or replacing the dead server and simply copying a 
snapshot of the data. 

In contrast to the loosely bound multimaster replication, 
NDB clusters can be viewed as a single entity. In fact, they're 
so tightly coupled, an NDB cluster entity can be used as a 
single server in a multimaster replication scenario. Although 
NDB is a huge advantage in terms of cluster synchronicity 
and management, the NDB engine does not support all the 
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features of other MySQL engines, and the NDB engine is the 
only one that can be used in an NDB cluster. The NDB engine 
does not support savepoints within MySQL and, thus, cannot 
support transactions and rollbacks. As a result of this, when an 
ALTER TABLE or CREATE TABLE command is issued, the table 
being altered should not be accessed. Although it locks the 
table on the current node, this is a local lock only and may 
cause data integrity problems or even crashes if the table is 
accessed on another node. 

Before setting up software for an NDB cluster, hardware is 
a concern. Because the clusters can act as a synchronous unit 
and there is no encryption built in to the package, a private 
network or VLAN is a best practice to employ. If the idea is to 
use NDB in a Web application, the inter-database links should 
be separated from the links that are used to execute queries. 

Networking in an NDB cluster can be set up in a number of 
different ways, but most users prefer standard TCP-over-Gigabit 
or higher-speed networks. 100Mbit networks will work, 
but they will be far less efficient for larger systems. 10Mbit 
networks are not supported. Although the replication configu¬ 
ration requires only the MySQL servers to be part of the setup, 
NDB requires management nodes and data nodes in addition 
to the MySQL servers. All of these should demonstrate the 
lowest latency possible. Jumbo frames also are a good idea on 
Gigabit networks, because fitting as much data into the packet 
as possible decreases the possibility of any kind of errors inter¬ 
rupting the synchronous transfer of data between NDB nodes. 

In addition to networks, increased RAM is a necessity as 
caching becomes more of a priority to decrease network traf¬ 
fic. The best way to increase 10 to disk is to increase spindles. 


types of daemons on the same physical servers, it is not 
recommended in a production environment. The best practice 
is to separate all of them onto their own pieces of hardware. 

NDB clusters consist of two configuration files. The first 
file, my.cnf, is the standard configuration file for MySQL. The 
second file, config.ini, is read only by the management server. 
The config.ini file includes configuration for the data nodes 
and is passed to them by the management server. 

The additions to the my.cnf file are fairly straightforward: 

[ndbjngm] 

ndb-connectstring=manage.example.com:1186 # The management server 
[ndb mgmd] 

conf1g-f1le=/etc/conf1g.ii| 


[ndbd] 

ndb-connectstring=manage.example.com:1186 # The management server 


[mysqld] 

ndbcluster 

ndb-force-send=l 

ndb-index-stat-enable=l 

engine-condition-pushdown=l 


# This turns the cluster on 

# Sends buffers immediately 

# Optimizes queries with NDB 

# index statistics 


[mysql_cluster] 

ndb-connectstring=manage.example.com:1186 # The management server 

The engine-condition-pushdown option forces 
MySQL to send the query directly to the storage engine 


Removing a dead slave can be as simple as using the "change 
master" statement to point at the grandfather server or replacing 
the dead server and simply copying a snapshot of the data. 


Adding more, smaller disks will increase the 10 throughput of 
any given node, but this is especially important on the data 
nodes, because these are the ones that have the actual data 
on disk. Increasing the number of data nodes also will increase 
the read speed of the cluster as a whole, but it will not increase 
the write speed significantly. If the database is behind a Web 
server, the read-to-write ratio is usually so high, this is exactly 
the kind of performance that is good for the application. If 
most of the application's queries are for writing data, focusing 
on the speed of single nodes is the best strategy. 

Before setting up the configuration for an NDB cluster, 
be sure that it is available in your distribution on MySQL. 
The show engines command should include an engine 
type of NDBCLUSTER with the Support column set to Yes. 

If this is not available, check your distribution for an external 
package, or install or compile the community package 
from www.mysql.com. 

The NDB configuration has three types of servers. The 
management server does configuration and monitoring of 
the cluster via the ndb_mgmd daemon. The data nodes store 
the data running the ndbd daemon, and the SQL nodes run 
the mysqld server itself. Although it is possible to run multiple 


instead of evaluating it in the mysql daemon. In an NDB 
cluster, this allows the NDB engine to spread queries across 
multiple data nodes. 

A basic config.ini file also is fairly easy to write. It must 
be placed in the location specified by the config-file line in 
the my.cnf file: 

# Management Node 
[ndbjngmd default] 

DataDir=/var/lib/mysql-cluster # This is where the management 

[ndb mgmd] 

HostName=manage.example.com # The machine's hostname 
[ndbd default] 

No0fReplicas=2 # There are 2 data nodes 

[ndbd] 

Ho5tName=datanode.example.com # The machine's hostname 
[mysqld] 
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HostName=datanode.example.com # The machine's hostname 
DataDir=/var/lib/mysql # This is where the data 

# node keeps data 

In this scenario, there are three servers, one of each type: a 
management server, a data node and an SQL node. Queries are 
sent to the SQL server, which then interoperates with the data 
nodes. Because the SQL servers can talk to multiple data nodes 
as necessary with the optimizer inside the NDB engine (set via 
the engine-condition-push variable above), this type of 
replication can operate on some SELECT queries far faster than 
the multimaster replication setup discussed above. On the other 
hand, NDB uses synchronous replication, so queries that write 
data, such as INSERTS and UPDATES, can take longer, because 
the data must be written to each node on the cluster. 

The problem of extending systems across various locations 
is well known in the industry. With Web servers and static 
content, this is a fairly simple situation. With cacheable 
content, this can be done using various caching services. 

Dealing with MySQL across multiple geographies is complex 
at best. It is not reasonable to set up an NDB cluster with nodes 
in separate data centers. Even if there is dedicated bandwidth 
between the boxes in the cluster, the latency across the link will 
cause large delays in issuing write commands to the system. 

The accepted way to set up a multi-geography NDB cluster 
is to have two separate NDB clusters, one per data center, and 
set up an asynchronous multimaster (or master-slave for 
failover only) replication system between the two. To do this, 
set up NDB clusters normally, add the autojncrement statements 
to the my.cnf file, add replication user permissions, and issue 
the "change master" statement at the MySQL prompt. 

This asynchronous relationship between geographies will 
create a great way to distribute your load across the systems, 
but it is still asynchronous. There can be cases where queries 
will return different results from the different locations where 
the data has not yet completed replication. If the application 
is a Web site showing photographs, this is generally not a 
problem. If the application is a bank, this inconsistency could 
result in large problems. 

Building a MySQL cluster, either using replication or NDB 
clusters, is a difficult task to get right the first time. Doing it 
in a hurry or with existing data in the system makes it even 
harder. Setting up a few systems as a test lab is a necessity. 
Although virtual machines are a good platform for setting up 
the configuration of the system, testing end-to-end performance 
also is a necessity in order to verify that the application will not 
suffer from poor database performance. This requires time on 
the actual hardware with the actual data and, if possible, a 
file full of actual queries run on the system. Multi-geography 
setups are even more difficult with a small budget, and it is 
good practice to think hard about the operational expense of 
running a second location compared to the cost of downtime. 
Finally, as good as your replications and clusters are, they are 
not a substitute for backups. Save early; save often.* 


Michael Nugent has spent a good deal of his time designing large-scale solutions to fit into tiny 
budgets, leveraging Linux to fulfill the roles that typically would be filled by large commercial 
appliances. Recently. Michael has been working to design large multi-geography database solutions 
for growing startups in the Silicon Valley area. When not building systems, he likes sailing. MIG 
welding and hanging out with his cat. MIDI. Michael can be reached at michael@michaelnugent.org. 
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The CHALLENGES 
of OPEN SOURCE 
in the ENTERPRISE 

Enterprise and open source—are they peanut butter and jelly, 
working well together for a better world, or are they oil and water, 
meeting but never coming together? In this article, I explore the 
challenges of adopting open source in the enterprise. I look at the 
technical issues, the business challenges and the political hurdles. 

AVI DEITCHER 

T here is an old Chinese curse, "may you live in interesting enterprises have legitimate business support needs that may or 
times." Of course, we all want to live in interesting may not be resolved by your open-source solution, 

times, but sometimes the interesting part can be a bit 
much. The enterprise is an interesting place. On the one hand, 
real enterprises have technology budgets that are quite large, 
sometimes even running into billions of dollars. Much of 
that budget is for labour, meaning that a successful enterprise 
technology person can make very good money, while learning 
a lot on the way. Although your typical tech shop may have 
a few servers and program in, say. Ruby, with an HTML front 
end backed by MySQL, in an enterprise, you are likely to 
encounter, and learn, every technology out there. If you like 
Ruby, it is there; Java, most certainly; .NET, that too. If your 
preferences run to infrastructure, you are likely to find everything 
from Windows servers to Linux to UNIX variants to mainframes 
to the unexpected. As recently as 2001, I worked as head of 
enterprise management at a place that had a massive farm of 
DOS 3.1 PCs; those were "interesting times". 

On the other hand, enterprises don't start or end with cool 
technology, and they are there to serve a business purpose. The 
most famous illustration of this is the Nine-Layer OSI Model by 
the legendary Evi Nemeth. 

Sure, you may have the best solution to a problem, but in an 
enterprise, you need to get the budget approved—on a multi¬ 
year cycle, of course—and then you likely need to go before 
some sort of capital expenditure (CapEx) or major expenditure 
review (MER) committee. Everyone there views your request as 
competing with their priorities for 1) budget allocation, because 
even a $1-billion IT budget is still finite, and 2) recognition and 
promotion, because after all, they want you to succeed, but 
they want their own projects to succeed even more. Finally, 
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Start with the Technical 

At base, everyone interested in open source is interested in 
technology, so let's address the technical challenges first. 
As you may have noticed, enterprises spend a lot of money. 
Unsurprisingly, to quote Willie Sutton who used to rob banks 
because "that's where the money is", many commercial 
technology businesses build products to focus primarily on the 
enterprise and solve its unique problems, and they have very 
large sales and marketing budgets to sell them. On the other 
hand, open-source products often are built, at least initially, to 
solve very specific problems. 

Thus, before advocating for open source, we need to 
understand if the open-source solution solves the problem as 
well as the commercial solution, given the entire requirements 
set. This includes not just the immediate technical problem, 
such as "serve up a Web page", but also the management 
challenges that can be unique to an enterprise, such as 
"replicate in real-time across 15 databases in ten countries 
around the world, while instantly alerting to any degradation 
and providing service-level agreement (SLA) reporting". In 
many cases, open source has indeed developed to the point 
where it truly can compete on a technical requirements level 


with a simple router, or even a dedicated box with a few NIC 
cards running mOnOwall, such a solution is highly unlikely to 
work in a large enterprise. There, the complexity, traffic 
demands and management requirements, as well as a three-tier 
architecture (core, distribution and access layers) can be 
done far more cost effectively, and in some cases, only with 
a hardware solution. Clearly, open source is not about to run 
enterprise networks. Having said that, it is not impossible that 
a split could occur. Currently, enterprise network equipment 
manufacturers provide both the hardware and software to 
manage routing, some of which may be based on open 
source, such as Cisco ASA 8.x. It is possible that in the near 
future, a pure-hardware networking equipment manufacturer 
could be formed that would sell the hardware only, while 
software is provided via an open-source solution, in a manner 
similar to current servers. 

The important takeaway from evaluating any technology 
is that it has to solve the immediate problem, such as serving 
Web pages, but also have the features required for an 
enterprise, such as management, logging and security. Rarely 
does it matter that the open-source product may be better or 


In many cases, OPEN SOURCE HAS INDEED DEVELOPED TO THE POINT WHERE IT TRULY 
CAN COMPETE ON A TECHNICAL REQUIREMENTS LEVEL with COMMERCIAL PRODUCTS. 


with commercial products. In other cases, it is not yet sufficiently 
evolved, but it may be some day. And in some cases, it is 
literally impossible to solve the problem with open source. 
Let's examine two extreme examples. 

1. Web servers: the dominant Web server for many years, of 
course, has been Apache. Although various competitors nip 
at its feet, such as IIS for Windows or nginx for sheer 
performance, Apache remains dominant for both intranet 
and Internet Web serving. In 2010, it is not hard to make 
the argument to adopt Apache for a Web server solution in 
the enterprise. It is mature, established, lots of well-known 
companies bet the business on it, and it has the various 
controls, hooks, logging and security that an enterprise 
demands. It is important to remember, however, that only 
a few years ago, Apache was not sufficient, and other 
commercial variants arose to fill in the gap, such as 
Apache Stronghold. The combination of a mature product, 
a complete enterprise-ready feature set and broad enterprise 
adoption make open-source Apache a selection as valid as 
any commercial solution. 

2. Network infrastructure: in the old days, when we had to 
decide whether to route mail via UUCP or SMTP, we built our 
own firewalls. Routers simply were dedicated servers with 
multiple network interface cards (NICs) on which we ran 
software to route the traffic. Over time, however, the prolifer¬ 
ation of networks and the demand for traffic-routing capacity 
and intelligent control exceeded the capabilities of these 
homegrown solutions. Special companies were formed to 
create specialized networking hardware. The most famous, of 
course, is Cisco. Although a small organization can make do 


that you want to support the community that brought us 
Linux/Apache/whatever. For adoption in the enterprise, the rule 
remains, as it should anywhere, first solve the actual problem 
and everything related it. 

Move to the Business 

In addition to solving technical problems, some of which are 
specific to an enterprise, there are unique enterprise business 
requirements as well. In a small IT environment or Web startup, 
no one wants a problem or outage any more than in an 
enterprise. However, the technical tolerance may be greater 
in a smaller environment, and it often is acceptable that the 
trade-offs require the lone in-house expert (that would be you) 
to "take care of the problem" in an emergency; often that is 
the actual crisis plan. In an enterprise, with postmortems, 
roles and responsibilities, and sometimes "pin the blame on 
the donkey", a support plan of "I will deal with it and work 
with on-line fora when it breaks" will not go over very well. 
The cost of error or failure is at least proportional and often 
even exponential to the size of the IT budget. 

These challenges create a minor requirement known as 
a service-level agreement (SLA). IT promises its customers, 
whether internal or external, certain service levels. In order 
to meet those levels, there needs to be a predictable and 
reliable point of service for every element of technology. 

For HP servers, there is a service contract and spares; for 
routers, it is Cisco support or a partner; for open source, 
it is ... ? In many cases, the product is stable enough or 
distributed enough not to matter. In other cases, it matters 
greatly. "If it breaks, who will fix it?" is likely the number 
one question CIOs will ask. They are not being difficult; 
they simply are doing their jobs, determining whether they 


www.linuxjournal.com july 2010 | 51 



FEATURE The Challenges of Open Source in the Enterprise 


can meet SLAs and what will be the true fully loaded cost 
of your open-source adventure. 

In that respect, one of the more interesting business 
ideas in the last decade is Red Hat. Its products are almost 
entirely open-source products that can be downloaded for 
free from elsewhere. However, it sells versions with full 
support. Essentially, Red Hat has decoupled product devel¬ 
opment from product support. There is nothing particularly 
special about Sun that allows it and only it to support 
Solaris (at least since Solaris was made open source). 

Anyone with sufficient expertise can do so. Recognizing 
that truth is the key to providing support for open-source 
products, exactly as Red Hat has done for Linux. It sold 
more than a half-billion dollars in support subscriptions in 
2009 for products that it, by and large, did not develop. 

Don't Discount Politics 

Politics is the bane of every technologist's existence. Politics 
is about the subtle art of power interplays, personalities 
and compromise. Technology, on the other hand, is about 
science, the truth and the correct way. For a technologist, 
proving your point through tests and scientific answers 
is the right way to go, but this path only antagonizes 
outsiders. For politics does not care about the right 
answer, but about the one that meets people's needs, 
rational and emotional. The solution may very well not 

‘‘If it breaks, who will fix it?” IS LIKELY THE 

be the best one. It may not even really solve the technical 
problem, but it is the one adopted nonetheless. 

Around six years ago, I was exploring solutions to a 
particular problem at a very large enterprise (around 100,000 
employees). There were several solutions, but the one I was 
advocating was open source. The other leading candidate 
was proprietary. I had a very good relationship with the 
firm's attorney, with whom I discussed the issue. "Let's say 
the product fails spectacularly", she said, "and we lose 
$10MM in business because of it, who do we come after? 
Who do we blame?" From her perspective, an attorney who 
is focused on the firm's legal needs, this is a perfectly valid 
reason to go for closed source, backed by a large company. 
From my perspective, I far preferred to go with the solution 
that would not only cost far less, but also would provide 
better performance, thus reducing the probability and 
expected cost of failure, let alone spectacular failure. 

As an aside, it is also important to note that my per¬ 
spective could be difficult for her politically. If we focus 
solely on reducing the probability and expected cost of 
failure, and accept damages due to failure as an unfortunate 
cost of doing business, then the legal department's value 
is concomitantly reduced. If she has any influence over the 
final decision, and she did, these issues, seemingly irrelevant 
to most technologists, must be taken into account. In 
this case, I actually did win her over by pointing to the 
End-User License Agreement (EULA). Like most such EULAs, 
there was a very strong limitation of liability. For example, 
if you read the EULA to Microsoft Windows XP Professional 
Edition, it clearly states that your Exclusive Remedy is limited 


to either replacement of the defective software or possibly 
refund of the cost of the software itself. If $10,000 in 
software causes $10MM in damage, the most you can 
get back is $10,000 (maybe). I pointed out that the legal 
department had already been rendered irrelevant for this 
software, and not by me. Thus, the choice of solution 
would neither reduce their position, nor strengthen 
someone (me) who had reduced that positioning already. 

Politics is the art of recognizing who wins and who loses 
with each decision. Understand the relationships, the power 
plays, who has the backing of the vendor you are explicitly 
discarding, who controls the budgets, and you will be in a 
better position to pick your battles and win them. 

Tying It All Together 

Open source has had huge amounts of successful adoption in 
the enterprise: Linux, Apache, Xen, Perl, PHP, Java and the list 
goes on. Open source also has had failures, either failure to launch 
(where it does not get adopted) or explosion on the launchpad 
(where it is adopted and fails). When looking to adopt an open- 
source solution in an enterprise, it is important to remember 
the entire nine-layer model and answer three questions: 

1. Does it meet all of the technical requirements, including those 
that are unique to running any technology in an enterprise? 


NUMBER ONE QUESTION CIOS WILL ASK. 

2. Does it have sufficient support and maturity to meet the 
business requirements of the enterprise? 

3. Can I move it through the process while taking into account 
the politics inherent in any enterprise? 

If the answer to all three is positive, you have a good situation 
for promoting adoption of an open-source solution.* 


Avi Deitcher is an operations and technology consultant based in New York and Israel who has 
been involved in technology since the days of the Z80 and Apple II. and he has worked with 
global enterprises through tiny Web startups. He has a BS Electrical Engineering from Columbia 
University and an MBA from Buke University. He can be reached at avi@atomicinc.com. 


Resources 


Evi Nemeth: www.cs.colorado.edu/~evi 

Nine-Layer Model T-Shirt: https://www.isc.org/node/232 

Microsoft Windows XP Pro EULA: 

www.microsoft.com/windowsxp/eula/pro.mspx 

Cisco: www.cisco.com 
mOnOwall: mOnO.ch 
Red Hat: www.redhat.com 
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NoSQL databases 
are all the rage, but 



T he articles on NoSQL databases in Reuven M. Lerner’s At the Forge column 
appearing in recent issues of LJ have been enjoyable. Because this is the 
Enterprise issue, I think it would be helpful to take a step back and look at the 
Linux database landscape and examine in particular the ongoing “battle” 
between SQL and NoSQL databases. By way of disclosure, I work for Monty Program, a 
company whose primary product is MariaDB, a community-enhanced branch of MySQL. 
That being said, I approached this topic with as open a mind as possible. 

The rivalry between SQL and NoSQL has been building during the past year to the point 
where some people are predicting the end of the SQL era. Actually, the two camps are largely 
complementary, because they’re designed to solve different problems. 


Daniel Bartholomew 
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The Case for NoSQL 

So, what is the big deal about NoSQL 
databases? For one, they've introduced 
new ways (or perhaps re-introduced old 
ways) of thinking about what databases 
are and what they can do. For another, 
they're shiny and new, and all the cool 
kids seem to be using them. You could 
argue that Google's BigTable is the 
database that inspired the NoSQL 
movement. Or, maybe it was Amazon's 
S3. Both of them are closed source, but 
they were (or are) impressive enough to 
inspire open-source interpretations. 

The current NoSQL field includes 
FIBase, Cassandra, Redis, MongoDB, 
Voldemort, CouchDB, Dynomite, 
Hypertable and several others. Some 
have followed the model of BigTable, 
others follow S3's model, some are 
a mix of the two, and others are 
charting their own path. Some of 
these projects are more mature than 
others, but each of them is trying to 
solve similar problems. 

Instead of having tables with 
columns and rows like you would find 
in a traditional RDBMS, most NoSQL 
databases are simple "key-value stores". 
Each piece of data that goes into the 
database is given a key, and when you 
want the data back, you use the key 
to get it. This simplicity is beneficial, 
because it helps busy sites achieve 
extremely low latency, even under high 
load, when paired with a large number 
of servers and a fast network. The 
simplicity of the key-value model also 
simplifies development. 

A step beyond simply having keys 
and values are the so-called document 
databases. A document, in this case, is 
a collection of various fields of informa¬ 
tion. Each individual document can have 
a different number of fields of varying 
lengths. These databases are useful if 
you have a lot of semi-structured data, 
and they are a good fit for object- 
oriented programming models (for 
example, you can consider the database 
as a storage area for objects). 

Why do traditional database users 
dislike these newcomers? D. Richard 
Hipp, the creator of SQLite, in a talk 
given at my local LUG, derisively called 
NoSQL databases "post-modern 
databases", because instead of giving 
you a definite answer to your question, 
they give you "an opinion" or their 
"best guess". His purpose was to paint 


NoSQL databases in a bad light, and for 
most of the old-school database world, 
the NoSQL, non-relational, BASE model 
(see the What Is ACID? sidebar) is more 
than a bit heretical. 

The heresy comes because historically, 
databases almost always have tried to 
implement the relational model and 
be fully ACID-compliant. If your transac¬ 
tions weren't ACID, or your database 
wasn't relational, the argument went, 
you couldn't call yourself a "real" 
database. Look at the MySQL vs. 
PostgreSQL flame wars for ample 
evidence of this thinking. 

The problem though, is that being 
relational and ACID is not necessary 
for some use cases and can add 
unnecessary overhead, which you 
don't want if you are running a popu¬ 
lar, heavily trafficked Web site. Many 
early users of MySQL knew this and 
were mocked for choosing MySQL 
over "real" databases like PostgreSQL. 
It is ironic now that MySQL has 
gained what every "expert" said it 
should have (ACID transactions), that 
a new movement has started up 
claiming that all the old database 
technology isn't actually necessary. 

What is necessary for top-tier 
Web sites, according to proponents 
of NoSQL, is massive scalability, low 
latency, the ability to grow the capacity 
of your database on demand and an 
easier programming model. These, and 
others, are things which, according to 
them, SQL RDBMSes just don't provide 
in a cost-effective manner. 

Most classic RDBMSes initially were 
designed to run on a single large server. 
That is how it was done in the late 
1970s and early 1980s, and the idea 
exists in the design of many RDBMSes 
to this day. The P in CAP (see the What 
Is CAP? sidebar) is meaningless when 
the database is on a single server (the 
server is either up or down, rarely or 
never only partly up), and traditional 
RDBMSes have focused mainly on 
Consistency, aka ACID, with Availability 
thrown in if you mirror between 
database servers or use hardware 
with no single points of failure. 

Some NoSQL databases also focus 
on the C and A parts of CAP. Unlike 
traditional RDBMSes though, these 
databases are designed from the 
ground up to be run on dozens, 
hundreds or even thousands of nodes 


Acronyms 

Whenever the topic of databases 
arises, an alphabet soup is thrown 
around that would make NASA 
proud. Some of the acronyms I use 
a lot in this article include: 

■ RDBMS: Relational Database 
Management System. 

■ SQL: Structured Query 
Language, also used to refer 
to databases that use SQL 
as their query language. 

■ NoSQL: used to refer to a class 
of databases that are non¬ 
relational and do not use SQL 
as their query language. They 
could perhaps be better 
called Distributed Database 
Management Systems (or 
DDBMSes), but for now, the 
popular term is NoSQL. 

■ ACID: Atomicity, Consistency, 
Isolation, Durability (see the 
What Is ACID? sidebar). 

■ CAP: Consistency, Availability, 
Partition tolerance (see the 
What Is CAP? sidebar). 


in a single data center. Partial partition 
tolerance for these databases is obtained 
by mirroring database clusters between 
multiple data centers. The advantage 
these databases have over a traditional 
RDBMS is that with the work spread 
over all of those machines, you can 
achieve ultra-low latency even when 
there are extremely high numbers of 
reads and writes, and with all those 
machines, you can analyze massive 
amounts of data quickly. 

Other NoSQL databases focus on the 
A and P parts of CAP and are designed 
to span multiple data centers. True to 
CAP, strong consistency is impossible 
for these databases. Weak consistency 
is an especially heretical thought to 
the RDBMS old guard. Instead, these 
NoSQL databases implement eventual 
consistency, whereby any changes are 
replicated to the entire database even¬ 
tually, but at any given time, a single 
node or group of nodes may not have 


www.linuxjournal.com july 2010 | 55 



FEATURE SQL vs. NoSQL 


What Is CAP? 

The CAP Theorem, also called Brewer's Theorem, first was proposed by Eric 
Brewer in a July 2000 keynote at the ACM Symposium on the Principles of 
Distributed Computing. It was formally proved in 2002 by Seth Gilbert and Nancy 
Lynch of MIT. The CAP Theorem states that it is impossible for any shared-data 
system to guarantee simultaneously all of the following three properties: 
consistency, availability and partition tolerance. 

Consistency in CAP is not the same as consistency in ACID (that would be too easy). 
According to CAP, consistency in a database means that whenever data is written, 
everyone who reads from the database will always see the latest version of the data. 
A database without strong consistency means that when the data is written, not 
everyone who reads from the database will see the new data right away; this is 
usually called eventual-consistency or weak consistency. 

Availability in a database according to CAP means you always can expect the 
database to be there and respond whenever you query it for information. High 
availability usually is accomplished through large numbers of physical servers 
acting as a single database through sharing (splitting the data between various 
database nodes) and replication (storing multiple copies of each piece of data 
on different nodes). 

Partition tolerance in a database means that the database still can be read from and 
written to when parts of it are completely inaccessible. Situations that would cause 
this include things like when the network link between a significant number of 
database nodes is interrupted. Partition tolerance can be achieved through some 
sort of mechanism whereby writes destined for unreachable nodes are sent to nodes 
that are still accessible. Then, when the failed nodes come back, they receive the 
writes they missed. In Cassandra, this is called hinted handoff. A database with good 
partition tolerance can span multiple data centers, whereas one with weak partition 
tolerance basically is bound to a single data center. 


the latest data. Like the NoSQL databases, 
which focus on C and A, the focus for 
A and P databases is on low latency, 
high throughput and anything else that 
makes the Web site more responsive 
and a richer experience for users. 

In addition to sometimes abandoning 
consistency in favor of scalability and 
latency, another way NoSQL databases 
break with tradition is in their aban¬ 
donment of the relational model. To 
be fair, some data truly does not nat¬ 
urally fit the relational model. This 
could be because the data changes 
form or size often, or because the 
data is completely unstructured. 

The final break with tradition in 
NoSQL databases is the thing that gave 
them their name. They don't use SQL. 
The reasons for dropping SQL usually 
revolve around it not fitting in with 
modern object-oriented development 
processes or some perceived difficulty 
in working with SQL. Sometimes the 


excuse given for not using SQL is a 
simple "SQL sucks", which isn't really 
a reason. Stupid reasons aside, the SQL 
language was designed for use with 
relational databases, and NoSQL 
databases are mostly non-relational, so 
it makes sense that they don't use it. 

The Case for SQL 

So what about plain-old SQL RDBMSes? 
Should they be retired from active 
service? Are they a relic from an earlier 
time? Not so fast. 

First and foremost, ACID transactions 
most definitely are required in certain 
use cases. Databases used by banks and 
stock markets, for example, always 
must give correct data. Where money 
is concerned, guessing is not allowed. 
It is true that no one really cares if your 
latest tweet takes a couple minutes to 
show up in your Twitter feed, but the 
same cannot be said for a billing system 
or accounting database. 


Another thing in favor of RDBMSes 
is their use of SQL. It's a common lan¬ 
guage, and if you need to move from 
one database to another, you usually 
can get away with making only minor 
changes to your application, and it will 
"just work". True, it may not be possi¬ 
ble in all cases, depending on how you 
used or abused the SQL queries in your 
application, but the foundation for 
moving easily between different SQL 
databases is there, and the tools and 
libraries you can use to interact with 
your data are plentiful and robust. A 
unified NoSQL standard query language 
or API will never exist because every 
NoSQL database is so different. 

On the NoSQL side, the only thing 
in common is that there is nothing in 
common. Each NoSQL database has its 
own set of APIs, libraries and preferred 
languages for interacting with the data 
they contain. With an RDBMS, it is 
trivial to get data out in whatever format 
you need using whatever programming 
language you like best. Your choice of a 
NoSQL database might limit you to one 
or a handful of programming languages 
and access methods. 

Another thing RDBMSes have going 
for them is the relational model. The 
R in RDBMS traces its history back to 
research by E. F. Codd published in 
the June 1970 issue of Communications 
of the ACM. Since then, it has been 
expanded upon, improved and clarified. 
The relational model for databases is 
so popular because it is an excellent 
way to organize information. It maps 
very well to an enormous variety of 
real-world data storage needs, and 
when properly normalized, it is fast 
and efficient. 

In the relational model, data is 
stored in tables with rows and columns. 
An address table, for example, might have 
columns for street name and number, 
city, postal code, state or province, and 
country. A name table might have columns 
for given names, family name, prefixes 
(Dr, Rev, Ms and so on) and suffixes 
(Jr, Sr, Esq and so on). Each row in 
the individual tables would represent 
an individual address or name. 

The relational part (see the What 
Does Relational Mean in a Relational 
Database? sidebar) comes into play as 
you define which addresses relate to 
which names using a key. A key is a 
field (the intersection of a row and 
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column) or combination of fields in a 
single row that is guaranteed to identify 
uniquely that particular row in the table 
it is in. For the address table, you might 
have a column for keys from the name 
table. You can use this key to look up 
just those addresses in the address table 
that "belong" (by virtue of the key) to 
a certain name in the name table. 

My example is pretty simplistic, but 
when combined with ACID transactions 
in an RDBMS, you achieve tremendous 
power, flexibility and reliability. There is 
a reason that businesses began using 
them decades ago and why open-source 
RDBMSes dominate the Web. 

And, what about the Web? The 
primary argument many people use 
against RDBMSes is that they "don't 
scale", which simply isn't true. It is true 
that some individual RDBMSes do not 
scale very well or are harder to scale, 
but that doesn't mean every RDBMS 
cannot. RDBMSes are in use at every 
large company. The largest RDBMS 
installations routinely handle enormous 
traffic and petabytes of data. 

This scaling myth is perpetuated and 
given credence every time popular Web 
sites announce that such-and-such 
RDBMS doesn't meet their needs, and 
so they are moving to NoSQL database 
X. The opinion of some in the RDBMS 
world is that many of these moves are 
not so much because the database they 
were using is deficient in some funda¬ 
mental way, but because it was being 
used in a way for which it wasn't 
designed. To make an analogy, it's like 
people were using flat-head screwdrivers 
to tighten Phillips-head screws, because 
it worked well enough to get the job 
done, but now they've discovered it is 
better to tighten Phillips screws with an 
actual Phillips screwdriver, and isn't it 
wonderful, and we should throw away 
all flat-head screwdrivers, because their 
time is past, and Phillips is the future. 

One recent SQL-to-NoSQL move 
involved Digg.com moving from MySQL 
to Cassandra. As part of the move, Digg 
folks blogged about how they were 
using MySQL and why it didn't meet 
their needs. Others were skeptical. 
Dennis Forbes, in a series of posts on his 
site (see Resources), questioned whether 
Digg needed to use a NoSQL solution 
like Cassandra at all. His claims centered 
on what he considered very poor 
database usage on the part of Digg 


What Does Relational Mean 
in a Relational Database? 

In common usage, the relational part of Relational Database refers to (or is 
often assumed to refer to) the way tables are related to each other via keys. 
For the truly pedantic though, this is in in fact incorrect. Relational here does 
not refer to relationships between tables, rather it refers to the mathematical 
concept of a relation, which is in essence what relational databases call tables. 
A relational database is a database based on the relational model. 
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What Is 
ACID? 

ACID is the classic measure of 
determining whether your 
database is good. A transaction 
in a database is a single logical 
operation. An example would be 
inserting an address or updating 
a phone number in an employee 
database. Every database pro¬ 
vides methods to do operations 
like those, but ACID formalizes 
the process. 

Atomicity means that the transac¬ 
tion either succeeds or fails. If the 
transaction fails, it should fail 
completely, and the database 
should be left in the state it was 
in before the transaction started. 

Consistency means that the database 
is in a known good state both 
before and after the transaction. 

Isolation means that transactions 
are independent of one another, 
and if two transactions are trying 
to modify the same data, one of 
them must wait for the other to 
finish before it can begin. 

Durability means that once the 
transaction has completed, the 
changes made by the transaction 
will persist, even if there is a sys¬ 
tem failure. A transaction log of 
some sort usually is used for this 
purpose. In MariaDB and MySQL, 
this is called the binary log. 

So, what is the opposite of ACID? 
BASE (Basically Available, Soft-state, 
Eventual consistency), of course. 
BASE is a retronymn coined by 
Dan Pritchett in an article in the 
ACM Queue magazine for describ¬ 
ing a database that does not 
implement the full ACID model, 
with the main difference being 
that it is eventually consistent. The 
idea is that if you give up some 
consistency, you can gain more 
availability and greatly improve 
the scalability of your database. 


combined with inadequate hardware. 
In his mind, if Digg had just designed 
its database properly or switched to 
using SSDs in its servers, it would 
have had no problems. His best quote 
is this: "The way that many are using 
NoSQL is like discovering the buggy 
whip at the beginning of the automo¬ 
tive era." Ouch. 

Relational databases sometimes can 
be tricky to design properly. You have to 
know and understand your data deeply. 
But when they are designed properly, 
the performance can be orders of 
magnitude better compared to poorly 
designed databases. You also should 
not overlook the hardware on which 
your database runs. Databases love as 
much memory and processing power 
as you can throw at them, and the 
traditional spinning-platter disk drive 
has long been a limiting factor. Does 
the high performance of SSDs herald 
a new age of RDBMS performance? 
Many experts say yes. SSDs may be a 
game-changer in the database world. 


Relational SQL databases have been 
around for several decades. They have 
proven reliability and performance and a 
feature set that meets the requirements 
of 99% of the use cases out there. They 
even make excellent key-value databases, 
if that's the type of data you have. There 
are only very few companies that can't 
make a relational database work for 
them. You may not like to hear it, but 
with the law of averages, chances are 
your company is not one of them. 

Conclusion 

My advice? Don't think of SQL vs. 
NoSQL as an either/or question. Options 
are a good thing. Many alternatives 
exist, so if you are having issues with 
your chosen database, experiment with 
different products on both sides and run 
your own benchmarks. 

Also look into how you are using 
your database. If the database was 
"bootstrapped" while you were creat¬ 
ing your killer application or service, and 
it is starting to give you problems, you 
might have an easily solvable design 
issue at the root of your troubles. If 


databases are not your thing, consult 
with an expert. RDBMSes have been 
around a long time, and there are 
plenty of experts. 

Whatever you decide to do, don't 
think of NoSQL as your escape from the 
SQL RDBMS world. NoSQL databases 
are not a panacea. I asked my boss, 
Monty Widenius, the creator of 
MySQL, what his opinion on the whole 
NoSQL vs. SQL thing was. His answer: 
"Non-SQL gives you a very sharp 
knife to solve a selected set of issues. 
If you find SQL too hard to use, you 
should not try Non-SQL." 

His basic point is that if you don't 
understand SQL RDBMSes, you'll probably 
end up hurting yourself by jumping 
into NoSQL. Key-value stores like those 
found in NoSQL databases do work 
for certain kinds of data, but they 
don't work well at all for other kinds. 

It is instructive to point out that the 
companies that use and have championed 
NoSQL databases have not given up 
on SQL RDBMSes. They continue to 


use them in vital roles. 

Finally, many of the NoSQL ideas 
are based on old technology. Key-value 
stores have been around for more than 
20 years, for example. New this time 
around are things like map-reduce 
(some claim that even this is an old 
idea), which spread the workload over 
many computers. In that sense, NoSQL 
databases really should be called 
distributed-DBMSes (DDBMSes?). Basically, 
distributed RDBMSes, without the R. 

Whatever you call them, NoSQL 
databases are solving problems that 
were considered "solved" by many 
in the RDBMS world a long time ago. 
They're just solving the problems in a 
different way, and they have a different 
set of requirements. If this new-old way 
solves an issue you're having, great! On 
the flip side, if your current RDBMS is 
meeting your needs, don't feel like you 
need to jump on the bandwagon.* 


Daniel Bartholomew works for Monty Program as a technical 
writer and system administrator. He lives with his wife and 
children in North Carolina and often can be found hanging 
out on both #linuxjournal and #maria on Freenode IRC. 


Does the high performance of SSDs herald 
a new age of RDBMS performance? 
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Resources 


SQL Databases: 

MariaDB: askmonty.org 
PostgreSQL: www.postgresql.org 

NoSQL Databases: 

Cassandra: cassandra.apache.org 

CouchDB: couchdb.apache.org 

HBase: hadoop.apache.org/hbase 

Redis: code.google.com/p/redis 

Voldemort: project-voldemort.com 

MongoDB: www.mongodb.org 

Hypertable: hypertable.org 

Dynomite: wiki.github.com/cliffmoon/dynomite/ 

dynomite-framework 

BigTable: labs.google.com/papers/bigtable.html 

Brewer’s CAP Theorem: 

Brewer's CAP Theorem by Julian Browne: 

www.julianbrowne.com/article/viewer/brewers-cap-theorem 

CAP Theorem: 

devblog.streamy.com/2009/08/24/cap-theorem 


Towards Robust Distributed Systems by Dr Eric A. Brewer: 

www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf 

Brewer's Conjecture and the Feasibility of Consistent Available 
Partition-Tolerant Web Services (2002) by Seth Gilbert and Nancy 
Lynch: citeseer.ist.psu.edu/544596.html 
E. F. Codd's "A Relational Model of Data for Large Shared Data Banks": 

www.seas.upenn.edu/~zives/03f/cis550/codd.pdf 

Other Links: 

Dennis Forbes on Software and Technology: 

www.yafla.com/dforbes 

Looking to the future with Cassandra by Ian Eure: 

about.digg.com/blog/looking-future-cassandra 

NOSQL debrief by Johan Oskarsson: 

blog.oskarsson.nu/2009/06/nosql-debrief.html 

BASE: An Acid Alternative by Dan Pritchett: 

queue.acm.org/detail.cfm?id=1394128 

Should you go Beyond Relational Databases? by Martin 

Kleppmann: carsonified.com/blog/dev/ 

should-you-go-beyond-relational-databases 

NoSQL Q and A: www.dbms2.com/2009/12/11/nosql-q-and-a 

NoSQL Video by Brian Aker: 

www.youtube.com/watch?v=LhnGarRsKnA 
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INDEPTH 


A 


Comparing Linux and 
Microsoft Windows for 
Enterprise Usage 

Selling Linux in the Enterprise often is a tough job, but with the right information, 
you can start making the case for Linux, jeramiah bowling 


For far too long, Linux has existed on the periphery of 
enterprise computing. Whether it is skepticism of open-source 
technology, a preference for paid instead of community 
support or the ever-forking tree of distributions, many businesses 
have shied away from Linux. In recent years, commercial 
Linux vendors have been hard at work polishing their 
distributions in the hope of establishing a beachhead in the 
enterprise. These mature distributions have rendered many 
past criticisms moot, and coupled with new opportunities in 
emerging technologies like virtualization, Linux stands poised 
to re-establish itself as an enterprise-caliber operating system. 
However, if these vendors are to be successful, they must 
take on the leviathan in the enterprise: Microsoft. 

In this article, I discuss several areas of the enterprise that 
are prime candidates for Linux adoption or expansion. In each 
case, I look at the current Microsoft offering in that area and 
then highlight a legitimate Linux-based contender. In doing 
so, I do not intend to keep a running score card and come up 
with an unsurprisingly biased conclusion (this is Linux Journal 
after all). I merely want to start the conversation in order to 
demonstrate Linux's inherent business value and strengthen 
the community at large. 

There are a few caveats before I proceed. For the purposes 
of this article, I have blurred the line between server and desk¬ 
top platforms to keep the discussion at a strategic level. The 
topics I examine may touch upon aspects of one or both plat¬ 
forms. I also have limited the distributions used here to those 
with paid support, as they tend to be targeted at the enter¬ 
prise market. With the exception of BIND and DHCP, I have 
avoided any technologies/packages, such as LAMP Samba, 
Sendmail or any iconic Linux app I felt already has been beaten 
into the ground with comparisons. I want to bring something 
new to the table. Finally, this article does not tackle the thorny 
issue of application serving or application compatibility. We all 
know the vast majority of business apps are developed for the 
Microsoft platform. Wine and/or Mono are not the answers. 
Developing software to emulate another vendor's code always 
will leave Linux users behind their Microsoft counterparts. 
However, the rapid growth of Web-based apps, advancements 
in virtualization (application and desktop) and the arrival of 
cloud computing may change this dynamic in the near future 
as applications become separated from the desktop. 


Desktop Security—User Account Control/Security 
Configuration Wizard 

User Account Control (UAC) has been an essential part of 
Microsoft OSes since Vista. UAC protects the OS by requiring 
services and programs to operate with the correct permissions 
via security confirmation prompts. It is meant to limit the num¬ 
ber of programs that run with unnecessary administrative privi¬ 
leges, a long-criticized weakness of applications developed for 
the Microsoft platform. Although UAC has received praise for 
making strides to eliminate this weakness, many admins have 
found that prolonged use leads some users simply to click Yes 
on the elevation prompts rather than evaluate the security risk. 
This leads to the elevation of non-desired programs, possibly 
to the detriment of the system. UAC can be complemented 
with the use of the Security Configuration Wizard that locks 
down unnecessary ports and services using a form-like survey 
to determine your minimum necessary configuration. 

Security always has been an important component of the 
Linux pedigree. Utilities like sudo and chroot, which limit the 
context of certain programs and operations, long have been 
part of the Linux security toolbox. In the case of Debian-based 
distributions, root access is prohibited except through the 
use of sudo. Also, most distros now utilize either AppArmor 
or SELinux as an additional security layer at the host level. 
Although SELinux and AppArmor take different tacts to 
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Figure 2. SUSE AppArmor Wizard 

securing a system, each utilizes a least-privilege-based 
approach to minimizing the threat surface through the use 
of profiles. Although SELinux (Figure 1) has the distinction of 
being developed by the National Security Agency and of being 
extremely secure, it can be difficult to administer. By contrast, 
many admins believe AppArmor is just as effective and easier 
to configure. Novell includes a nice GUI tool for AppArmor in 
SUSE Enterprise Linux that includes a wizard for profiling 
applications that is a real time-saver (Figure 2). 

Host-Based Firewalls—Windows Firewall 

The Windows firewall included in Server 2008 and Windows 7 
is a great improvement over previous incarnations. It filters on 
packets, IP addresses and source/destination program, and its 
management GUI is easy to use. However, it lacks some of the 
advanced features found in Linux-based firewalls. In contrast. 



Linux has been wed to open-source firewall development in 
near lockstep since ipchains and now iptables. Although many 
admins still prefer the text-based administration of iptables, 
there are many easy-to-use GUI-based interfaces, such as the 
one found in SUSE through Yet another Setup Tool (YaST, 
Figure 3). Unfortunately, these tools often limit access to 
advanced features, such as port redirection, IP translation and 
quality of service, which can be accessed from the command 
line. To be fair, some of these capabilities are available in 
Server 2008 by adding other modules (RRAS) or products (ISA), 
but that adds another layer of administration and cost where 
Linux possesses them out of the box. Some admins may feel 
that firewalls are not a significant factor in enterprise security 
except in the perimeter. Others suggest that firewalls are more 
important now than ever, because technologies like the cloud 
and mobile computing are erasing the traditional boundaries 
of the perimeter. Only time will tell. 

Package Management/Updates—Automatic 
Updates/Windows Software Update Services 

The last decade easily could have been labeled the Decade of 
the Patch. Because of the ever-evolving security landscape, 
new vulnerabilities are discovered daily. Don't get me wrong. 
Security researchers provide an invaluable service to the indus¬ 
try, but sometimes when I have to push patches en masse 
daily, I pine for the old days when I could just push a single 
service pack every so often. Patching is not solely a Microsoft 
phenomenon. Vulnerabilities exist in Linux as well. Most 
modern operating systems worth their salt include a native 
updating mechanism to address flaws and vulnerabilities. In 
Windows, it is Automatic Updates for individual systems or 
Windows Software Update Services (WSUS) for managing a 
large number of systems. Microsoft has done well with both 
programs and should be applauded for their maturation in the 
last five years. Like its name implies. Automatic Updates auto¬ 
mates the patching of host systems through a Control Panel 
interface. WSUS adds reporting features and the ability to 
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Figure 5. Canonical's Landscape Service for Ubuntu 


centralize patch distribution, although the process for approving, 
denying and/or superseding patches can be kludgy. 

Linux updating mechanisms vary by distribution, but 
share similar functionality with their Microsoft counterparts. 
Debian-based systems have apt, Red Hat-based systems have 
Yellowdog Updater Modified (YUM), and SUSE has YaST 
(which provides a graphical front end to the ZYpp package 
management engine). Each tool is easy to automate and 
includes the ability to resolve dependency issues prior to an 
update. They also share the ability to deploy local repositories 
to reduce bandwidth consumption as with WSUS, but to 
achieve the nicer dashboard and reporting features of WSUS 
requires subscription-based services, such as Red Hat Network 
(Figure 4) or Landscape from Canonical (Figure 5). 

Basic Network Services—Microsoft DNS/DHCP 

DNS and DHCP are production network roles where many 
Linux servers make their entry into an enterprise. Although 
these services may seem boring, they form the backbone of 
the modern enterprise. On the Microsoft side, we have the 
proprietary versions of DNS and DHCP included in Server 
2008. Both are configured using the Server Manger utility 
and then administered through their respective mmc consoles. 
Microsoft has integrated its versions of DNS and DHCP 
deeply with Active Directory (AD) and a multitude of its propri¬ 
etary network services. Although on the surface this may not 
seem like a problem, a single misconfiguration can affect mul¬ 
tiple parts of the Microsoft infrastructure (AD, Exchange and 
so on). On the Linux side, we have the Berkeley Internet Name 
Domain (BIND), the standards-based market leader. BIND is a 
dependable workhorse that has enough flexibility to support 
Active Directory and keep DNS administration separate from 
other parts of the infrastructure. You can administer BIND 
through the command line or GUI tools like the Red Hat BIND 
Configuration Tool (Figure 6). 

Alongside DNS, DHCP is a critical, though overlooked 
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Figure 6. Red Hat’s BIND Configuration Tool 



network service. It also is an excellent springboard for Linux 
in a new environment. It is low impact and can integrate into 
almost any existing network with little interruption. DHCP is 
available in most distros, and tools like those found in YaST 
make administration a snap (Figure 7). DNS and DHCP usually 
can be combined on a single server, as is found in many 
Microsoft environments, but with a smaller footprint. 

Directory Services—Active Directory 

Active Directory is the heart of Microsoft networking. It is a 
powerful tool that has a solid reputation for providing reliable 
directory services. Chances are, unless you are already a *nix 
shop, you're probably using it right now. AD has dominated 
the landscape for so long that many people forget its roots. 

In the strictest sense, AD is an LDAP-based server that uses 
Kerberos for authentication and DNS for name resolution. The 
reason for its dominance is twofold: its flagship mail product 
(Exchange) requires it, and every Microsoft desktop and server 
OS shipped has a built-in AD client. Directory services existed 
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before AD, and other alternatives are available (even non-Linux 
ones) that provide similar services. 

One of the better alternatives is eDirectory from Novell 
(Figure 8). eDirectory has its roots in Novell Directory Sen/ices 
(NDS), the highly popular directory service that dominated the 
enterprise in the 1990s. Although Novell has lost considerable 
market share to AD in the last decade, it has continually 
improved its directory products. eDirectory is scalable, supports 
multimaster replication and is OS-agnostic, which means it 
can easily be deployed to almost any environment (including 
Windows). For Linux systems, eDirectory can run on either SUSE 
or Red Flat Enterprise servers. eDirectory can be managed by 
using ConsoleOne (Figure 8) or the newer, sexier iManager 
Web management package (Figures 9 and 10) that uses role- 
based assignment of privileges. This is similar to AD; however, 
the level of granularity over directory permissions found in 
iManager is far greater. As a side note, Novell currently has a 
standing relationship with Microsoft that each will support the 
other's products. This could be a benefit when campaigning for 
a bigger Linux presence in a Microsoft-centric enterprise. 
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Figure 9. User Creation in iManager (eDirectory) 



Figure 10. eDirectory Management Tasks in iManager 

Virtualization—Microsoft Hyper-V 

Virtualization may be the hottest topic in the industry at 
the moment. It seems like "virtual" is the buzzword of 
every other Webinar out there. I won't spend time explaining 
the value of virtualization, save that server consolidation and 
desktop/application virtualization seem to be the biggest 
reasons so many people are interested in it. Microsoft made 
a major move into the virtualization arena with its release 
of Flyper-V. Unlike Microsoft's earlier product. Virtual Server, 
Flyper-V sports a fully virtualized hypervisor that removes 
the need for running a virtual server on top of a "fat host". 
Flypervisors allow guests to access underlying hardware 
directly, and because there is very little overhead, performance 
is dramatically improved. Flyper-V has received a number of 
improvements with the release of Server 2008 R2. It now has 
more enterprise-grade capabilities for management and high 
availability, and most notably, support for live migrations. It 
can be managed with the Flyper-V Manager Console, an 
enterprise-grade tool for creating and managing Flyper-V 
hosts and guests. 

There are Linux-based options for virtualization as well. For 
the longest time, Xen was the darling of the Linux virtualization 
movement. Following the acquisition of Xen by Citrix, many 
vendors have begun making the switch to using the Kernel-based 
Virtual Machine (KVM) module as their primary virtualization 
platform. KVM is a hypervisor module that can run in a 
kernel of 2.6.20 or higher, but it does require a compatible 
vm-enabled processor. Red Flat, formerly a huge supporter 
of Xen prior to its acquisition, has tied its wagon to KVM. In 
fact. Red Flat is releasing its KVM-based Red Flat Enterprise 
Virtualization (RFIEV) product as a direct competitor to Flyper-V, 
VMware and Xen. RFIEV is composed of a minimalist RHEL 
KVM-enabled installation, tweaked as a host system for 
virtualization. Unlike most virtualization products on the market, 
RFIEV is rolling out a competitive subscription-based pricing 
model that includes both the hypervisor and manager software 
in the same license (often sold separately). It also touts 
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Figure 11. Managing VM’s with virt-manager in Ubuntu 


Ubuntu, which provides a Just enough OS (JeOS, pronounced 
"juice") image configured specifically for virtual appliances. 
KVM hosts can be managed using the GUI-based virt-manager 
package (Figure 11) or other command-line tools. 

Cloud Computing—Microsoft Azure/Cloud 
Computing Initiative 

Cloud computing is almost as buzzworthy as virtualization, 
which is funny considering that it is an offshoot of the virtualiza¬ 
tion movement. Cloud computing refers to a strategy of using 
a pool of resources (such as servers, storage, bandwidth) or a 
"cloud" to offer individualized servers or services to customers. 
Cloud services usually pertain to Web-based application services, 
but more and more apps are appearing "in the cloud". These 
newer apps include corporate e-mail hosting, file storage, user 
collaboration and mobile apps. Clouds are a cost-beneficial 
proposition for smaller customers that want the advantages of a 
data center (clustering, high availability/disaster recovery) without 


advanced virtualization features, such as live migration and 
automatic server failover. I really wanted to test-drive RHEV 
for this article, but I was unable to obtain a trial version of 
the product. Regardless, KVM runs near flawlessly in most 
distributions. For demonstration purposes, I deployed KVM on 





Figure 12. Ubuntu Enterprise Cloud Web Interface 



Figure 13. Managing Cloud Instances with Hybridfox 
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the cost of maintaining one. Amazon has been a pioneer in this 
area with its Elastic Compute Cloud (EC2) service where you can 
purchase your own cloud servers or applications that run within 
the Amazon cloud. Microsoft has jumped into the market and 
poured considerable resources and energy into the emerging 
technology. It has been live with its public cloud. Azure, since 
2009. Microsoft's private cloud, which will be managed through 
System Center, is scheduled for release in the first half of 2010. 

If you want to deploy a private Linux-based cloud now, 
you can do so with Ubuntu. The process is remarkably simple. 
Download Ubuntu server and launch the server install process. 
Upon boot, you will see an option from the main install screen 
to install the server as a Ubuntu Enterprise Cloud (UEC) server 
either as a cluster controller or as a node. You will need one of 
each to get started. Once up and running, you can download 
images from the management site (Figure 12) or begin creating 
your own images that match your cloud needs. The cloud you 
are deploying actually is a re-branded version of the open-source 
cloud software Eucalyptus. Management is accomplished via 
command-line or GUI-based tools like hybridfox (Figure 13), 
a Firefox add-in that runs like a modified version of Amazon's 
Elasticfox management utility. 

Many other areas of the enterprise are ripe for Linux penetra¬ 
tion. The ones presented here represent some of the best chances 
for Linux adoption in the vast majority of enterprises. I encourage 
you to download and test these options to see how beneficial 
they can be to your business. Linux's future development, its very 
survival, rests in its ability to stake a claim in the business 
computing market, and the only way to do that is by constantly 
challenging the status quo with viable, cost-saving alternatives. 
Hopefully, I've given you some of those alternatives here.* 


Jeramiah Bowling has been a systems administrator and network engineer for more than ten years. 
He works for a regional accounting and auditing firm in Hunt Valley. Maryland, and holds numerous 
industry certifications, including the CISSP. Your comments are welcome atjb50c@yahoo.com. 


Resources 


Red Hat Network: https://rhn.redhat.com 

Canonical Landscape: 

www.canonical.com/projects/landscape 

BIND: www.isc.org/software/bind 

Novell eDirectory: www.novell.com/products/edirectory 

RHEV: www.redhat.com/virtualization/rhev/server 

KVM: www.linux-kvm.org 

Ubuntu Enterprise Cloud (Private): 

www.ubuntu.com/cloud/private 

Hybridfox: code.google.com/p/hybridfox 
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Getting Started with Quickly 

Quickly helps you (quickly) write applications with Python and GTK. jono bacon 


At the heart of what makes Linux thrive as an operating 
system are applications. Within it is a vibrant, diverse range 
of applications, satisfying even the most particular needs, all 
just a few clicks away. With such an imaginative range of 
applications available, a similarly vibrant developer community 
has formed, complete with a vast array of tools, languages 
and functionality. Unfortunately, although powerful, many 
of these tools are awkwardly complex, and many developers 
have let their ideas and creativity become buried under an 
avalanche of confusion around how those tools fit together. 

Part of the cause of this problem is that many developer 
tools cater only to systematic developers—the kind of 
code-writing workaholics who hack for a living, with a 
fervent attention to detail backed up by unit tests and 
other hallmarks of the professional programmer. There are, 
however, developers of a different sort who are driven by 
writing practical code, scratching their itches and having 
fun writing programs and sharing them with others. These 
are opportunistic developers. 

As part of our work in Ubuntu, we have been keen to 
harness opportunistic developers and enable them to do great 
work using Ubuntu as a platform. As part of this goal, we 
have developed a series of tools to make it simple for you to 
break down the barrier between idea and implementation, 
and help you to scratch your itches more quickly and easily. 
One such tool is Quickly (wiki.ubuntu.com/Quickly). 

Enter Quickly 

Quickly gets you up and running (quickly, of course) writing an 
application from scratch. Traditionally, writing desktop applica¬ 
tions has involved a not-insignificant amount of faffing required, 
with build systems, source control, packaging frameworks, 
graphical interface tools and other things that get in the way 
of writing code. Quickly is a tool that simplifies how those 
different things fit together. 

Quickly provides a framework with a series of templates 
for creating different types of applications. With each template, 
a series of opinionated decisions are made about the tools 
involved in creating that application. By far, the most popular 
template and the one that Quickly itself was created to satisfy 
is the Ubuntu template. This template uses a set of tools that 
has become hugely popular in modern desktop software devel¬ 
opment, and tools we have harnessed in Ubuntu. They are: 

■ Python: a simple, easy-to-learn, flexible and efficient, 
high-level language. 

■ GTK: a comprehensive and powerful graphical toolkit for 
creating applications and the foundation of the GNOME 
desktop environment. 


■ GNOME: the desktop environment that ships with Ubuntu, 
offering many integration facilities. 

■ Glade: an application for creating user interfaces quickly 
and easily, which then can be loaded right into your 
Python programs. 

■ GStreamer: a powerful but deliciously simple framework 
for playing back and creating audio, video and other 
multimedia content. 

■ DesktopCouch: a framework for saving content in a 
database that is fast and efficient, hooks neatly into Ubuntu 
One and is awesome for replication. 

■ gedit: for editing code—Quickly assumes you are going to use 
the text editor that ships with Ubuntu, which provides a simple 
and surprisingly flexible interface for writing your programs. 

With this core set of tools, you can write any application 
you can imagine and know that it will run effortlessly on 
Ubuntu and other distributions. Let's make the magic happen. 

Getting Quickly 

Today, Quickly primarily is used on Ubuntu and is not currently 
packaged for other distributions, although we hope this 
changes in the future and that other distributions use Quickly 
too. If you are running Ubuntu, getting Quickly is as simple 
as installing from the Ubuntu Software Center or firing up 
a terminal and running: 

sudo apt-get instatt quickty 

After a few minutes, you should be up and running. 

Creating a Project 

With Quickly installed and ready to roll, let's start 
creating a simple application. Fire up a terminal with 
Applications-»Accessories-»Terminal, and enter the 
following command: 

quickty create ubuntu-project myapp 

This command uses Quickly to create a new Ubuntu Project 
called myapp. You will see a flurry of lines fly past your eyes 
as Quickly generates the new project and saves its various files 
inside a new directory called myapp. When Quickly finishes 
generating the project, it runs it automatically, and you should 
see a window that looks remarkably similar to Figure 1. 

The generated application has a number of important 
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To start changing this user interface, run 'quickly glade', 
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Figure 1. Myapp Main Window 

elements common to many applications, such as a menu bar, 
menu items and status bar, and it also includes a label with 


some text and a rather nice Ubuntu circle of friends image. 

Feel free to click through the menus and play with your new 
program. It won't do much yet, but from this pre-existing 
base, you now can turn it into any program you want. Let's 
start working on it. First, go into the project directory: 

cd myapp/ 

Quickly has a series of commands that each begin with 
the quickly command. The first command you need to know 
is how to run your program. Simply use the run command: 

quickly run 

This runs your program and displays it on the screen. When 
you're finished with the program, you can close it down either 
by clicking the window close button or pressing Ctrl-C inside 
the terminal. 

Now, let's create a really simple program that demonstrates 
how basic development works with Quickly and its key compo¬ 
nents: Python and the GTK widget set. To do this, the program 
will have a text entry box and when you type in a word, it will 
search for that word on Google. Although delightfully simple, 
it demonstrates the basics well and is a good place to start. 
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Changing Your Program's Interface 

In every program, you can use buttons, scrollbars and other 
interactive things to click on to construct your program inter¬ 
faces. These building blocks for creating interfaces are called 
widgets, and they are part of the GTK toolkit. 

First, let's make some changes to the user interface to 
remove some unneeded widgets. You also will want to 
add a text box widget. To edit your user interface, use a 
program called Glade, which lets you visually construct 
your interface by pointing and clicking. Later, you can 
hook different widgets up to code that do interesting 
things. First, load Glade with: 

quickly glade 

When Glade pops up, it should look eerily similar to Figure 2. 
(Note: in Quickly 0.4, the command is quickly design.) 



Figure 2. Glade 

Glade has a few components to its interface. In the middle, 
you can see the current interface on which you are working. 
There, you can click on widgets to highlight them, move them 
around, delete them and more. The collection of icons on the 



Figure 3. Text Entry 

example, if you click on the image of the circle of friends (the 
Ubuntu logo) in your application interface in Glade, you can 
see the contents of the widget settings area adjust to show 
the available settings for a GTK image widget. If you click on 
the text above the image (which is called a GTK Label), you 
will see the settings reflect that widget too. 

Now, let's adjust the interface to reflect this simple 
application. Having clicked on the label, change the text 
that is displayed by looking at the widget settings on the 
right of the main Glade window, and look for the label 
option. In there, delete the existing text, and enter the 
following text: "Type in a search term below:". You should 
see the label in your user interface change. 

With the label complete, you don't really need the circle of 
friends image, so click it and press the delete key. When the 
image is deleted, you will see a gray space open up behind it. 
This is an empty part of your interface where you can put 
another widget. It's also rather convenient, because you will 
want to fill this space with a text entry widget where your 
users can type in their search terms. 

To add a widget, use the tool palette area on the left 
side of the main Glade interface window. In the Control 
and Display section, hover over the icons until you find 


Traditionally, writing desktop applications has involved a not-insignificant amount 
of faffing required, with build systems, source control, packaging frameworks, 
graphical interface tools and other things that get in the way of writing code. 


left of the main Glade interface is called the Tool Palette, 
and it provides a wide range of widgets you can use in 
your application. Simply click on a widget, and then click 
in your application window to add it. 

To the right of the Glade interface are two main areas. 
At the top, you can see the widget hierarchy. This shows 
that widgets are part of other widgets. Many widgets act as 
containers for others. As an example, a button typically has a 
label on it with some text and the label (a gtk.Label) sits inside 
the button (a gtk.Button). 

Below the widget hierarchy is a collection of tabs that 
all reflect settings for the currently selected widget. As an 


the Text Entry item (typically, it's the third icon down on 
the left). Click it, and then click in the gray space that 
opened up when you deleted the image. You now should 
see the text entry appear, and your user interface should 
look like Figure 3. 

With the widget there, you should name it. All widgets 
in your interface can be referenced throughout your 
code, and you often will use this name to reference them. 
To do this, go to the widget settings area on the right 
of the Glade interface, and in the Name option, enter 
"search_box" as the name. You can call the widget 
whatever you like, but I usually refer to what it does (for 
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example, search) and then use an underscore and add the 
description of the widget (for example, "box" for a text 
box). This makes it easy to determine what the widgets do 
when reading your code. 

Connecting Clicks to Code 

Before continuing, let's take a brief break from the tools to 
discuss a key aspect of how graphical programs work—a 
technique known as Event Driven Programming. It's a fairly 
simple idea. When users interact with one of the widgets 
in your program, it will trigger behavior that you want. For 
instance, in this example program, you want users to enter a 
search term into the text box, and when they press the Enter 
key, the program will search for the term in a Web browser. 

When you interact with a widget in a certain way, it 
generates a signal to indicate what you did to the widget. In 
this case, you are interacting with a text box widget, and there 
are a range of signals for different ways of interacting with 
it, such as copying text to the clipboard, pasting text in there, 
moving the cursor with the arrow keys, typing in a letter 
and more. This example application is specifically intended to 
search Google when users press the Enter key (pressing Enter 
typically indicates they have finished typing), so it's a good 
time to trigger the desired behavior. 

The way this works is you will use Glade to specify which 
handler in your program code you want to call when a particular 
signal is generated. In this case, the signal that is generated 
when you press the Enter key is called activated, and soon you 
are going to create a handler called search_for__term in your 
code to respond to the signal. 

To make this connection, ensure that the text box 
currently is highlighted in Glade, and in the widget settings, 
click the Signals tab. There you will see a list of signals in 
the Signal column. Now, click in the space to the right of 
the activated signal, and in the Handler column, enter 
"search_for_term" as your handler. Now, click File-»Save 
to save your work in Glade. 

Writing Some Code 

With the user interface complete, now let's write the 
search_for_term handler that performs the search. To edit your 
program's code, simply use the edit command in the terminal: 

quickly edit 

This will fire up each of your source files in your project 
into the default Ubuntu text editor, gedit. A number of different 
source files will load, but most of the action happens in the 
myapp file. This is the main Python program that is executed 
when you run quickly run. 

The code you need to write to take the term entered 
into the search box and search Google with it is pretty 
simple, and you can use the webbrowser Python module 
to help. In the myapp file, after the import gtk line add: 
import webbrowser. 

This imports the webbrowser Python module, which loads 
URLs into the system's default Web browser. 
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Now, scroll down, and after the on_destroy handler, add 
the following after the gtk.main_quit() line: 

def search_for_term(self, widget, data=None): 

.'Search for the term entered" 1 "' 

searchurl = "http://www.google.com/#hl=en&source=hp&q=” 
searchterm * searchurl + widget.get_text() 
webbrowser,open_new_tab(searchterm) 

Here, you add the search_for_term handler, and it has 
three arguments that are passed to it: 

■ self: all class methods are passed self, this is normal Python. 

■ wi dget: this is a reference to the widget that called the 
handler. You can use this to get information from the text 
entry widget. 

■ data=None: when you call a handler, you can pass it additional 
data if you like, but you can ignore this for this example. 

When this handler runs, first construct the final search 
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Figure 4. Myapp Main Run 


term by concatenating "http://www.google.com/ 
#hl=en&source=hp&q=" and the content that was typed 
into the search box. To get this content, use the widget 
that was passed to the handler automatically. This is a 
reference to the text entry widget and can be used to run 
any of the methods that are part of the gtk.Entry text 
entry widget. One such method available is get_text(), 
which simply returns the text that was entered. As such, 
concatenate this with the Google URL, and you now have 
a complete URL you can pass to the Web browser. For 
example, if you typed in "chickens", the full URL would be 
"http://www.google.com/#hl=en&source=hp&q=chickens". 

To pass the URL to the browser, use the webbrowser 
module and its open_new_tab() method that opens a new 
tab with the URL that you pass it. 

With the code complete, let's run it to double-check that 
everything works: 

quickly run 

You now should see something similar to Figure 4 in which 
you can type some text, press Enter and see the results in your 
browser. If you see some errors in your terminal, be sure to 
double-check that you typed in everything correctly. 

Quickly is an incredibly simple and powerful tool for gener¬ 
ating applications, and I barely have scratched the surface of 
what is possible with it. You can find out more about using 
Quickly by visiting wiki.ubuntu.com/Quickly a 


Jono Bacon is the Ubuntu Community Manager at Canonical, author of The Art Of Community 
published by O’Reilly, founder of the Community Leadership Summit and co-presenter on 
Shot Of Joe/ and FLOSSWeekly. 
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Server Monitoring with 
Zabbix 

Think implementing a large monitoring service is tedious? Not so with Zabbix. Start 
monitoring several-dozen critical hardware and services quickly, paultader 


Zabbix (www.zabbix.com) is an open-source, commercially 
backed monitoring solution that supports UNIX, Linux, BSD, 
Mac OS X and Windows platforms and is built to support 
large installations. Zabbix is the creation of Alexei Vladishev 
and his company Zabbix SIA. This article is based on the latest 
version of Zabbix, version 1.8.1, which was released in January 
2010. At the time of this writing, most Linux distributions 
include the previous version (1.6) in their repositories. There 
are significant changes when compared to 1.8, but most of 
this article still applies. A partial feature list includes: 

■ Distributed monitoring. 

■ Clients for Linux, BSD, Windows, Mac OS X and 
commercial UNIXes. 

■ Database back end (MySQL, Oracle, PostgreSQL or SQLite). 

■ Auto-discovery mode. 

■ Web-based interface. 

■ Notifications via e-mail, SMS or Jabber. 

■ Support for polling or trapping Zabbix client messages. 

■ SNMP. 

■ Agent-less monitoring (ping, port checks and so on). 

■ Graphs. 

Although you can install Zabbix from your Linux distribu¬ 
tion's repositories, I'm going to install version 1.8.1 from 
source using Ubuntu 9.10 for the server platform with a 
MySQL database back end. I also show how to configure 
a Linux client with the basic monitoring that comes with a 
default Zabbix installation. 

Prerequisite Applications 

Before compiling the Zabbix sources, I need to install 
prerequisite packages: 

shett> sudo apt-get instalt mysqt-server apache2 
**libapache2-mod-php5 php5-mysql php5-gd 
**libmysqlclientl5-dev libsnmp-dev libiksemel-dev 


‘*libcurl4-gnutls-dev 

(In Ubuntu, the package names are mysql-server, apache2, 
libapache2-mod-php5, php5-mysql, php5-gd, libmysqldientl 5-dev, 
libsnmp-dev, libiksemel-dev and Iibcurl4-gnutls-dev.) 

Installation 

The Zabbix sever and client will run as the user zabbix, so you 
need to create an account: 

shelt> sudo useradd -s /bin/true zabbix 

Next, create the zabbix database: 

shelt> mysqt -u<username> -p<password> 
mysqt> create database zabbix; 
mysqt> quit; 

Download the source code from www.zabbix.com/ 
download.php, uncompress the archive and then follow 
the steps below to set up the database schema and default 
configuration. Note that I am using the MySQL schema files 
to set up my database; there are different schema files for 
the other supported databases: 

shell> sudo tar zxvf zabbix-1.8.1.tar.gz 
shetl> cd zabbix-1.8.1/create/schema 

shells cat mysql.sql | mysql -u<username> -p<password> zabbix 
shells cd ../data 

shells cat data.sql | mysql -u<username> -p<password> zabbix 
shells cat imagesjnysql.sql | mysql -u<usernames -p<passwords zabbix 

To compile the server code, cd back to the root of the 
extracted zabbix-1.8 source directory, and run the following 
command to compile the server binaries to have support for 
MySQL, SNMP and Jabber: 

shells ./configure --enable-server --with-mysql --with-net-snmp 
^--with-jabber --with-lib-curl 
shells sudo make install 

shells ./configure --enable-agent --enable-static 
shells sudo make install 

I recommend building static binaries for the clients. This 
helps when deploying the client across different (Linux) 
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From here, you can change the 
expression to trigger on averages, 
absolute values or maximum 
values for a period of time. 


versions. Although the make install command builds and 
installs the server binary zabbix_server, it does not build the 
client agent binary. To compile the client binaries, cd into 
zabbix-1.8.1/src/zabbix_agent, and run another make install. 
The binaries then are installed into /usr/local/sbin—the same 
location for the server binary: 

shell> ./configure --enable-agent --enable-static 
shell> cd src/zabbix_agent 
shell> sudo make install 



Figure 1. Zabbix Introduction Screen 


Two zabbix binaries are compiled: zabbix_agentd and 
zabbix_agent. The latter is used to run the client from a 
superserver, such as inetd, and the former runs as a daemon. 
It's recommended to run the zabbix_agentd. 


Server and Client Configurations 

Zabbix uses one configuration file for the server and another 
for the client. Sample configuration files are available in 
the zabbix-1.8.1/misc/conf directory. Make a directory 
called /etc/zabbix, change the ownership of the directory 
to the user zabbix, and copy the zabbix_server.conf and 
zabbix_agentd.conf files to this directory. 

There isn't much to change in either of the configuration 
files, but they are well documented within the files themselves. 
Two configuration parameters in the client zabbix_agentd.conf 
file that should be changed are the lines Server= and Hostname=. 
The first should point to your Zabbix server and the second 
should be the hostname of the client. 

With the exception of maybe the DBUser and DBPassword 
parameters in the zabbix_server.conf file, nothing else needs to be 
changed if you're running a site with less than a few hosts. Look 
through both configuration files and refer to the Zabbix docu¬ 
mentation for any variables that could be helpful to your site. 

Startup Scripts 

You can find several startup script examples within the 
zabbix-1,8.1/misc/init.d directory. Copy the one for your installation 
to /etc/init.d, and make any necessary changes. For Ubuntu, I 
used the scripts located in the debian directory. In both the server 
and agent configuration files, I needed to change the location of 
the binary from /home/zabbix/bin to /usr/local/sbin. 

Zabbix Web Front End 

The zabbix-1. 8 .1/frontends/php directory contains the 
Web-based front end to Zabbix. Copy this directory structure 
somewhere below Apache's DocumentRoot, and load that 
URL in your Web browser. You will be greeted with the Zabbix 
Introduction screen (Figure 1). This wizard-like page steps 


Figure 2. Front-End Options 



Figure 3. Admin User Configuration Page 

through your configuration and presents you with a License 
Agreement. The next screen details any configuration changes 
that need to be made before continuing, such as PHP memory 
and execution time settings. 

Once past the configuration screen, the main login 
screen loads. The default account is Admin with the 
password zabbix. Of course, once you're logged in, change 
the default password. The front-end layout consists of two 
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rows of options (Figure 2). Click Administration, then 
Users. Make sure the pull-down menu located on the 
right side of the screen has Users selected instead of User 
Groups. Next, click on the admin user. A configuration 
page for the Admin user is shown (Figure 3). First, change 
the password. Also, add an e-mail address (click Add next 
to the Media line), as we're going to configure alerts to 
be sent via e-mail later in this article. 

Adding a Host 

Three files need to be copied to a new client: the zabbix_agentd 
client binary to /usr/sbin, the zabbix_agentd.conf configuration 
file to /etc/zabbix and an init script. Edit the zabbix_agentd.conf 
configuration file, and change the line that reads Server= to 
equal the Zabbix server name, and change the Hostname= line 
to equal the client hostname. Once completed, start up the 
zabbix agent with the init script. 

Back on the Zabbix server Web page, click Configuration-* 
Flosts within the Web front end. Make sure Flosts is selected 
in the pull-down menu on the right-hand side of your 
screen, and then click the Create Host button. The Hosts 
configuration screen appears (Figure 4). You can give your 
host any name you choose, but I recommend staying with 
the short hostname (hostname -s) instead of a fully quali¬ 
fied domain if you can. Add it into the Linux servers group, 
and populate the DNS name with the fully qualified DNS 
name. I could choose to monitor this host with its IP 
address, but I'll trust that DNS always will be up to date. 
The only other change to this page is to click Add under 
the Linked templates area. Click the radio button next 
to TemplateJJnux and choose Select at the bottom of 
this pop-up window. Back at the Host screen, click Save. 

All the monitoring Items and Triggers included in the 
TemplateJJnux will be added to the client. 

The Zabbix monitoring structure starts with Items (checks 
or collects data), then Triggers (monitors data in Items) and 
finally, Actions (e-mail, SMS or run scripts). 

Items 

Items can be considered the "data collectors". Some items are 
built in to the agent binary, and others will be custom scripts. 
After installing Zabbix, you will have a range of templates that 
contain these Items for common operating systems checks. 
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Figure 4. Host Configuration Screen 
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such as Linux, Solaris, MAC OS X and Windows systems. 

Let's look at the template we used with our first client. 
Using the Global search box at the upper right-hand side, 
search for "TemplateJJnux". The search results should 
return a page that has links to the Items, Triggers or 
Graphs for this template (Figure 5). Select the Items link. 
All these Items will be monitored on any host that has the 
TemplateJJnux template applied to it, such as the first 
host configured above. 

Click the Item called Free disk space on /. Flere are all the 
details for this Item (Figure 6). Most fields are self-explanatory, 
but here are few important ones: 

■ Description: a free form field that describes the check. 
Note that in the free disk space check there is a $1. Zabbix 
replaces this with the first field in the key (explained later). 

■ Type: a Zabbix agent type is a check preformed by the 
agent running on the client at defined intervals. A Zabbix 
agent check is compiled in the binary, such as checking 
free disk space, number of free/used inodes or a custom 
written script. Another type is a Zabbix Trapper. A Zabbix 
Trapper acts like an SNMP trap. Its value is updated only 
when the client sends the update by 
running the binary zabbix_sender. 

For example, say you have a cron job 
that takes 30 minutes to finish. 

Normally, the Zabbix server will time¬ 
out waiting for a response from the 
client running this script. A better 
way would be to add a line in the 
cron job script to update the Zabbix 
server when it's finished using the 
zabbix_sender program. Another 
type of check is called Simple checks. 

This is used for agent-less clients— 
for example, pinging a host or check¬ 
ing a specific port (e-mail, SSH and 
so on) with an external host. 

■ Key: this field is the "expression" that 
Zabbix will check. It can be a built-in 
key, such as the free disk space Item 
(vfs.fs.size[/,freej) or a custom script 
that you wrote. The documentation 
details all the built-in keys and expres¬ 
sions that can be used. 


it to the Filesystem application. 

Triggers 

Select the Triggers link from your Global search results 
(Figure 5). A trigger in Zabbix monitors the data that the 
Items collect. If the data exceeds a configured threshold, it's 
assigned to one of six severity levels. Figure 7 shows the 
triggers that come with TemplateJJnux. Displayed are the 
severity level, status, description and an expression that 





Figure 5. TemplateJJnux 



Figure 6. Free disk space on / Item 


You also can tell Zabbix what type 
of data is going to be returned: text, 
characters or numbers and a multiplier 
for that value. Also, you can specify for 
how long you want fine-grained graphs 
(history) and trends. The Applications 
section is where you can group similar 
checks. For example, if you were adding 
another filesystem item, you would add 
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One of the screens I find most informative is the Status of Triggers Web page. 



makes up the trigger. Click on the trigger named Low free 
disk space on TemplateJJnux volume /, and the trigger 
configuration screen should come up (Figure 8). 

The first field, the Name field, should describe the problem. 
For instance, "IMAP port not responding on server123" is 
better than "E-mail down". This most likely will be the text 
that you're going to receive in an e-mail, page or SMS message, 
so a clear, descriptive name will be very helpful at 2am should 
that call occur. 

The Expression field is what Item this Trigger is 
going to monitor and what its thresholds are. Our 
expression for this trigger is configured with 
{Temptate_Linux:vfs.fs.size[/,pfree].last(0)}<10", 
which loosely reads, "Monitor the host called TemplateJJnux 
and its key vfs.fs.size[/,pfree]. If the last value it returned is 
less than 10, assign it a severity level of High." Click Select. 
From here, you can change the expression to trigger on 
averages, absolute values or maximum values for a period 
of time. For now. I'll leave the trigger function as is, except 
I want to change at what value it triggers. So close the 
Condition pop-up window and change the expression to 
5% by changing the value from 10 to 5 at the end of the 
line. Click Save to make the changes. 

Actions 

Actions occur when a trigger is activated. They can be 
via e-mail. Jabber, SMS message or running a remote 
script. Let's configure an action to e-mail the admin if any 
trigger with level Disaster has been activated. Select 
Configuration-»Actions, and then the Create Action 
button on the right-hand side of the screen. The 
Configuration of Actions screen should be visible (Figure 
9). Name it something helpful, then click the New button 
under Action conditions. Choose Trigger severity from the 
New Condition area, and change the severity level from 
Information to Disaster. Click Add when finished. Next, 
select the New button in the Action operations area. 


Figure 9. Configuration of Actions Screen 
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Figure 10. Configuring the Operation 


Configure the operation to send a message to a single user 
named admin (Figure 10). Click Add when done. Finally, 
click the Save button. Now, any trigger that you assign the 
severity level of Disaster will result in an e-mail being sent 
to the Admin user. You can create Actions for a single 
trigger from a specific host if needed, but the action above 
can be treated like a "site-wide" action. 

Daily Monitoring 

There are several ways to monitor the clients you have config¬ 
ured. One of the screens I find most informative is the Status 
of Triggers Web page. Click Monitoring-»Triggers (make sure 
Group and Host have "all" listed from the pull-down on the 
right-hand side of the screen). On this screen, Zabbix lists all 
the triggers that have been activated, their assigned severity 
level, the date of last change and short description as well 
as an Acknowledged and Comments column. This could be 
considered a sysadmin's to-do list.* 


Paul Tader is an independent consultant implementing open-source solutions in the Chicago area, 
where he has run every Linux and BSD flavor since the mid-1990s as well as instructing Linux 
certification courses at a local college. Feel free to contact Paul at ptader@linuxscope.com. 
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Markets in Three 
Dimensions 

Moving beyond the horizontal, docsearls 



Old analog televisions had controls 
called horizontal and vertical. One 
kept the picture from falling over 
sideways, while the other kept the 
picture from sliding off the top or 
the bottom of the screen. 

I think we need similar controls in 
our heads when we look at the product 
categories we call markets. 

In the world of free software and 
open source, we mostly care about 
the horizontal: creating code, stan¬ 
dards and other means for expanding 
markets sideways, toward the horizons. 
We work to keep both developers 
and users free from capture in the 
vertical markets we call silos, so they 
can do more stuff in more ways. We 
don't often ask "Where would we 
be without silos?" because we know. 
We would be free to build and use 
anything we want, out in the market's 
wide-open spaces. 

Still, innovation happens in silos 
too. Many of the technical graces 
we take for granted would not have 
happened outside the walls of silos 
built by Apple, Canon, IBM, Intel, 
Sony and other large companies with 
ample and hardened intellectual 
property portfolios. Linux itself was 
developed originally (and still primarily) 
for Intel's x86 CPUs. Much of what 
we take for granted in chips from 
Intel and other makers is thick with 
intellectual property protections we 
would hate to see applied to our own 
software work. Every large maker of 
original electronic products (including 
all the companies listed above) pro¬ 
duces between dozens and thousands 
of new patents every year, adding to 
portfolios that muscle licensing 
income and deals of many other 
kinds. Occasionally, companies do 
battle in court, but most of the time 
the dealing is quiet. If revealed, it is 
only through pro forma small-print 
disclosures in documentation. 


What matters is that these portfolios 
give large makers the confidence and 
security they feel is required to produce 
original and appealing goods for which 
there is little or no competition. This is 
an ideal to which Apple, for example, 
constantly aspires. Back in 1997, not 
long after Steve Jobs returned to lead 
Apple after a long interregnum, he 
killed off cloners of the company's 
computers. Here's what I wrote about it 
at the time in an e-mail to Dave Winer 
(which Dave later published): 

To Steve, clones are the drag of 
the ordinary on the innovative. 

All that crap about cloners not 
sharing the cost of R&D is just 
rationalization. Steve puts enor¬ 
mous value on the engines of 
innovation. Killing off the cloners 
just eliminates a drag on his 
own R&D, as well as a way to 
reposition Apple as something 
closer to what he would have 
made the company if he had 
been in charge through the 
intervening years.... 

Now Steve is back, and gradually 
renovating his old company. 

He'll do it his way, and it will 
once again express his Art. 

These things I can guarantee 
about whatever Apple makes 
from this point forward: 

1. It will be original. 

2. It will be innovative. 

3. It will be exclusive. 

4. It will be expensive. 

5. Its aesthetics will be impeccable. 

6. The influence of developers. 


even influential developers 
like you, will be minimal. The 
influence of customers and 
users will be held in even 
higher contempt. 

The iPod, iPhone and iPad each not 
only fulfilled all six of those requirements, 
but also redefined their market 
categories in the vertical dimension. 

That is, they grew the range of things 
that could be done with a given device, 
and the size of its marketplace. The 
iPhone in particular redefined the 
smartphone market and enlarged it far 
beyond the narrow range of possibilities 
allowed by combinations of mobile 
phone makers and mobile phone 
companies. New kinds of applications by 
the thousands burst out of the ground 
like a geyser. But all are contained 
inside Apple's silo, where they move 
only through the company's sphinctered 
approval and sales processes. 

In fact, far more can be done on 
Linux (notably Android) and Symbian 
OSes than on iPhone's, just given the 
open nature of the former and the 
closed nature of the latter. But, thanks 
to Apple, there is much more to imagine 
doing outside that company's closed 
and private silo. 

This is the point at which some 
suggest that open-source goods are 
derivative, rather than original. But that's 
not the case. We still happen to live in 
a time when investment in closed, and 
original stuff is greater than investment 
in the open and original kind. As long 
as that's still the case, we'll have the 
Apples of the world aiming for the heights 
while the rest of us build out the 
widths and the depths that characterize 
wide-open marketplaces.* 


Doc Searls is Senior Editor of Linux Journal. He is also a 
fellow with the Berkman Center for Internet and Society at 
Harvard University and the Center for Information Technology 
and Society at UC Santa Barbara. 
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Gemini 2 : The Fantastic Four 



in ouriX-Gemini 

line, the Gemini 2 . Cleverly disguised as any other 2U server, the Gemini 2 
secretly houses 4 highly efficient, extremely powerful RAID 5 capable servers. 
Each node supports the latest Intel® Xeon® 5600 or 5500 series processors, up 
to 192GB of DDR3 memory, and three 3.5" hot-swappable hard drives. 


This system architecture achieves breakthrough x86 server 
performance-per-watt (375 GFLOPS/kW) to further satisfy the 
ever-increasing demands for efficiency, density and low-TCO 
of today's high performance computing (HPC) clusters and 
data centers. For more information and pricing, please visit 
our website at ini2. 



Features 


Each node supports the following: 

• Dual 64-Bit Socket 1366 Six-Core, Quad-Core, or Dual-Core, 
Intel® Xeon® Processor 5600/5500 Series 

• 3 x 3.5"SAS/SATA Hot-swappable Drive Bays 

• Intel® 5520 Chipset with QuickPath Interconnect (QPI) 

• Up to 192GB DDR3 1333/1066/800 SDRAM ECC Registered 
Memory 

• 1 (xl 6) PCI-E (Low Profile) 

• Matrox G200eW 8 MB DDR2 Memory Video 

• Integrated Remote Management - IPMI 2.0 + IP-KVM with 
dedicated LAN 

• All four nodes share a Redundant 1200W High-efficiency Power 
Supply (Gold Level 92%+ power efficiency) 



800-820-BSDi 

http://www.iXsystems.com 
Enterprise Servers for Open Source 









Cool, Fast, Reliable 

GPGPU computing for your office and data center 



Designed from the ground up for ultimate customer satisfaction, Microway's 
WhisperStation integrates the latest CPUs with NVIDIA Tesla GPUs. Tesla's 
massively multi-threaded Fermi architecture, the CUDA™ C and FORTRAN 
language environments, and OpenCL™ provide the best performance 
for your application. 


► Up to Four Tesla Fermi GPUs per WhisperStation, with 448 cores and 
6 GB GDDR5, each delivering 1 TFLOP single and 515 GFLOP double 
precision performance 

► Up to 24 cores with the newest Intel and AMD Processors, 128 GB 
memory, 80 PLUS® certified power supply, and eight hard drive 

► Nvidia GeForce GTX 480 for state of the art graphics 

► Ultra-quiet fans, strategically placed baffles, and internal sound-proofing 


The Microway Advantage: Custom Integrations and 
HPC Expertise Since 1982 

Put our years of expertise with Linux, Windows, CUDA and OpenCL 
to work for YOU! 

Every Microway system is backed by pre and post sale techs who speak 
HPC. Whether it's graphics or GPGPU, FORTRAN or MPI, hardware 
problems or Linux kernel issues; you can talk to Microway's experts to 
design and support solutions for power hungry applications. 



Configure your next WhisperStation or Cluster today! 

www.microway.com/quickquote or call 508-746-7341 


Microway's Latest Servers for Dense Clustering 

► 1U nodes with 48 CPU cores, 512 GB and QDR InfiniBand 

t 1U nodes with 24 CPU cores, 2 Tesla GPUs and QDR InfiniBand 

► 2U Twin 2 with 4 Hot-Swap MBs, each with 2 Processors + 256 GB 

► 1U S2070 servers with 4 Tesla Fermi GPUs 

The Fastest CPUs and GPUs Ever 

► 12 Core AMD® Opterons with quad channel DDR3 memory 

► 8 Core Intel® Xeons with quad channel DDR3 memory 

► 448 Core NVIDIA® Tesla™ Fermi GPUs with 6 GB GDDR5 memory 
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